Page tree
Skip to end of metadata
Go to start of metadata

This project is available as a student work experience opportunity with HPCC Systems this summer. Curious about other projects we are offering? Take a look at our Ideas List. Find out about the HPCC Systems Summer Internship Program.

The project proposal application period for 2019 summer internships is now open! The deadine date for proposals is Friday 29th March 2019. To get notifications, subscribe to our Community Forum.

These projects involve the development of machine learning algorithms to extend the existing ECL-ML (ECL Machine Learning) on the HPCC Systems platform, using ECL and the underlying Parallel Block BLAS (PB-BLAS) infrastructure.

While the general purpose is to develop these algorithms mostly in ECL, using other languages that can be embedded in HPCC (Python, Java, Javascript, R, C++) is an acceptable option too. A few functions are C++ functions with a C++/ECL wrapper around them.

The HPCC Systems Machine Learning Library, is a work in progress. However, it already contains many classification, regression and clustering algorithms.

There are currently 3 available projects in this area:

  1. Develop and Assess Unsupervised Anomaly Detection Methods
  2. Implement an approximate n-tile algorithm
  3. Word Vectorization
  4. Applying HPCC Systems Word Vectors to SEC Filings

Objectives for the ML functions

  • To be able to handle large training sets in a timely manner by distributing the training set across the nodes of an HPCC System Thor cluster.
  • To produce statistics that measure the goodness of the fit of models created by the ML functions

Available Resources

Machine learning projects previously completed by students

  1. Text Search Bundle - Implementation of Equivalence Terms 
    Completed by Farah Alshanik (Clemson PhD student in Computer Science) as part of the 2018 HPCC Systems Intern Program 
    Hear Farah speaking about her project at our Tech Talk held in September 2018.
    See the poster she entered in our Technical Poster Contest 2018
  2. Healthcare: High Risk Geo-Social Analysis (Drug Socialization)
    Completed by Nicole Navarro (MS in Data Science, New College of Florida) as part of the 2018 HPCC Systems Intern Program
    Hear Nicole speak about her project at our Tech Talk held in November 2018
    Nicole was the second place winner of our Technical Poster Contest 2018. See Nicole's poster 
  3. Begin development of a software library (consisting of ECL and Python code) that would provide HPCC Systems distributed neural network training
    Completed by Robert Kennedy (Masters in Computer Science, Florida Atlantic University) as part of the 2018 HPCC Systems Intern Program
    Hear Robert speaking about his project at our Tech Talk held in August 2018 and on the main stage at our Community Day Summit 2018 ( Watch Recording / View Slides). Robert was also the third place winner of our Technical Poster Contest 2018 at the same event. See Robert's poster.
  4. Implementation of CLDA Topic ModelingAlgorithm in ECL-ML 
    Completed by Lili Xu (PhD in Computer Science, Clemson University) as part of the 2018 HPCC Systems Intern Program
    Hear Lily speaking about her project at our Tech Talk held in September 2018.
    See the poster she entered in our Technical Poster Contest 2018.
  5. Extension of KMeans clustering and KNN classifier to time series data model in ECL-ML
    Completed by Soukaina Filali (PhD Computer Science, Georgia State University) as part of the 2018 HPCC Systems Intern Program
    Hear Soukaina speaking about her project at our Tech Talk held in September 2018
  6. Dimensionality Reduction and Feature Selection in ECL-ML
    Completed by Shah Muhammad Hamdi (PhD Computer Science, Data Mining,  Georgia State Universityas part of the 2018 HPCC Systems Intern Program
    Hear Hamdi speaking about his project at our Tech Talk held in August 2018.
    See the poster he entered in our Technical Poster Contest 2018
  7. Implement a Gradient Trees Algorithm
  8. Completed by George Mathew (NCSU PhD student in Computer Science) as part of the 2017 HPCC Systems Intern Program
  9. Documentation Generator for ECL Code
    Completed by Sarthak Jain (Northwestern University PhD student in Computer Science) as part of the 2017 HPCC Systems Intern Program
  10. Implement a Latent Semantic Analysis Algorithm in ECL
    Completed as part of the HPCC Systems Summer Internship Program 2016
  11. Empower ECL-ML: How to make the HPCC Systems Machine Learning Library easier to use
    Completed as part of the HPCC Systems Summer Internship Program 2016
  12. Implement a YinYang K-Means Clustering Algorithm in ECL
    Completed as part of the HPCC Systems Summer Internship Program 2016
  13. Implement the Converse Sparse Cholesky Selection Algorithm in ECL
    Completed as part of the HPCC Systems Summer Internship Program 2016
  14. Analyse which algorithms may provide the best results for the implementation of Non-Negative Matrix Factorisations (NMF) in ECL
    Completed as an independent voluntary contribution in 2016
  15. Add new statistics to the Linear and Logistic Regression Modules
    Completed as part of the GSoC Program 2015
  16. Implement the CONCORD Algorithm
    Completed as part of the HPCC Systems Summer Internship Program 2015




  • No labels