This project is available as a student work experience opportunity with HPCC Systems this summer. Curious about other projects we are offering? Take a look at our Ideas List.
Find out about the HPCC Systems Summer Internship Program.
The project proposal application period for 2020 summer internships is now open. Please see our list of Available Projects. Contact the project mentor for more information and to discuss your ideas. You may suggest a project idea of your own but it must leverage HPCC Systems in some way. Contact us for support from an HPCC Systems mentor with experience in your chosen project area.
Design and implement a tool which can run across a cluster and test each system and the network between them to provide an overall check if everything is working as expected. This will be available from a button within ECL Watch.
For each system we can check available memory, availalble disk, memory latency and bandwidth, file system latency and bandwidth and for cluster network latency and bandwidth.
Include any background information that may help other developers to understand what you want to achieve and why. Also describe what would need to be done preferably in the order in which work needs to be completed. Indicate where there may be links with other areas of the HPCC System.
Completion of this project involves:
Provide details about the following:
- Checked in code- Including any installation code/scripts/etc. required.
- Documentation - Within the code and a how to install and use guide
- Test code - Test code may be the test itself.
- Regression tests - Regression may be the test itself.
Expected feature list
A report output from tool with results on:
Memory available, latency and badnwidth
File system space available, latency and bandwidth
Network latency and bandwidth
NOTE: Tool should (and is expected to) leverage existing separate open source tools for measuring these individual parts, such as mem_lat, iozone, mpi-tests.
By the mid term evaluation, we would expected you to have completed the following:
A framework created that can be installed and can run at least one local and one network test.
Backup Mentor: Attila Vamos