Execute multiple workflow items in parallel

This project was completed during the 2020 HPCC Systems Intern Program by Nathan Halliday.

Find out about the HPCC Systems Summer Internship Program.

Resources Available to Learn More about this completed project:

Project Description

Currently the workflow engine executes a single item at a time, and waits for that workflow item to complete before continuing.  Some jobs would benefit significantly if separate persists or independent actions were executed in parallel. The workflow information already contains all the dependencies, and information about items that need to be executed sequentially. It may be sensible to initially only support this for roxie/thor, and then revisit hthor.

This possibly becomes even more significant on cloud environments since it would be quite possible to spin up extra thors on demand, so allow multiple graphs to be processed in parallel.

If you are interested in this project, please contact Gavin Halliday.

Completion of this project involves:

  • Restructure the workflow engine to create a graph of tasks that can be use to track which tasks have been executed, and which tasks should be executed next.

  • Ensure that there are no multi threading issues in the workflow engine (e.g. the way persist information is calculated).

  • Check eclagent for any multi threading issues

By the mid term review we would expect you to have:

Mentor

Gavin Halliday
Contact Details

Backup Mentor: TBC

Skills needed
  • Ability to build and test the HPCC system (guidance will be provided).

  • Ability to write test code. Knowledge of ECL is not a requirement since it should be possible to re-use existing code with minimal changes for this purpose. Links are provided below to our ECL training documentation and online courses should you wish to become familiar with the ECL  language.

Deliverables

End of project

  • Restructure the workflow engine to create a graph of tasks that can be use to track which tasks have been executed, and which tasks should be executed next.

  • Ensure that there are no multi threading issues in the workflow engine (e.g. the way persist information is calculated).

  • Check eclagent for any multi threading issues

Other resources

All pages in this wiki are subject to our site usage guidelines.