About Nathan Halliday
Nathan Halliday is a high school student studying at Hills Road College, Cambridge, UK. Following his internship experience with HPCC Systems, Nathan will go on to study Mathematics at St Annes, University of Oxford.
The ECL language is centred around high performance. HPCC Systems focuses on parallelism to enable highly optimised dataset operations.
The parallel workflow engine increases the scope of parallel processing from within activity graphs to the entire workflow. The goal is to make workunits faster but maintain the existing behaviour of the sequential engine.
During my project, I have gradually extended the parallel engine to increase support for different ECL language constructs. Regression tests for different workflow modes in combination, ensure that the engine can process diverse queries.
One major challenge of the parallel engine was to implement condition items, since only one sub-branch of dependencies are executed by the engine. It also has a complex task of mimicking the sequential engine if the workflow fails.
The parallel workflow algorithm is planned to become default in HPCC Systems version 7.12.0 It is beneficial for all ECL programmers and the speedup is achieved without altering the language functionality. For production systems, money will be saved, by providing the clusters with more work sooner. For cloud environments, additional resources can be added dynamically, to maximise the benefits of the faster processing.
In this Video Recording, Nathan provides a tour and explanation of his poster content.
Poster Title: Preprocessing Bundle for HPCC Systems Machine Learning Library
Click on the poster for a larger image. The original PDF version can be found here. (Available for download).