The project was completed by Saminda Wijeratne, a Masters students studying Computational Science and Engineering at Georgia Institute of Technology. Our students join the HPCC Systems Research and Development team and this was a project that was more on the side of research. Although MPI is now the industry standard, it wasn’t when we were first designing and implementing HPCC Systems back in the year 2000, in particular our Thor cluster. MPI was quite unstable in those days and considered too risky, so we built our own solution which serves us very well. We had been curious for some time about how MPI compares with our solution so having Saminda on the team was a great opportunity to have someone focus on finding out the answers to the questions we had been thinking about for some time.
Find out about the HPCC Systems Summer Internship Program.
Replace existing socket-based message passing api with an open-source MPI (such as openmpi or mvapich)
Currently we use our own message passing api (libmp) built on top of sockets to send/recv messages. We would like to use MPI in place of this to leverage RDMA/tree-based comm/native IB/etc for performance gains. Initial exploration by our team a while ago indicated this appears feasible, but concurrent communication by different threads was toublesome at the time and perhaps newer MPI releases resolve this. Network support from the MPI selected must include ethernet but should also include InfiniBand.
Include any background information that may help other developers to understand what you want to achieve and why. Also describe what would need to be done preferably in the order in which work needs to be completed. Indicate where there may be links with other areas of the HPCC System.
Completion of this project involves:
Provide details about the following:
- Checked in code
- complete replacement of our libmp layer with a new one using MPI, including how to handle jsocket notify events in the rest of HPCC.
- MPI cluster installation integration with platform or how-to-install guide
- Document code directly and provide installation/test guide
- Test code
- can use existing mptest and regression suite, could extend mptest for more performance benchmarks
- Regression tests
- can use existing mptest and regression suite.
Expected feature list:
All the same features and functions as existing libmp.
By the mid term evaluation, we would expected you to have completed the following:
mptest working within a single machine using > 1 slaves and > 1 virtual slaves
Backup Mentor: Jake Smith