The proposal period for 2022 internships is now closed
The proposal period for 2023 internships will open in November 2022
These projects are available as a student work experience opportunity with HPCC Systems this summer. Curious about other projects we are offering? Take a look at our Ideas List. Find out about the HPCC Systems Summer Internship Program.
These projects involve the development of machine learning algorithms to extend the existing ECL-ML (ECL Machine Learning) on the HPCC Systems platform, using ECL and the underlying Parallel Block BLAS (PB-BLAS) infrastructure.
While the general purpose is to develop these algorithms mostly in ECL, using other languages that can be embedded in HPCC (Python, Java, Javascript, R, C++) is an acceptable option too. A few functions are C++ functions with a C++/ECL wrapper around them.
The HPCC Systems Machine Learning Library, is a work in progress. However, it already contains many classification, regression and clustering algorithms.
These projects are currently available in this area:
- Anomaly Detection Algorithms - Already taken
- Applying the Causality Toolkit to Real World Datasets - Already taken
Objectives for the ML functions
- To be able to handle large training sets in a timely manner by distributing the training set across the nodes of an HPCC System Thor cluster.
- To produce statistics that measure the goodness of the fit of models created by the ML functions
Available Resources
- Examples of existing code
- HPCC Systems Machine Learning documentation
- HPCC Systems Technical Poster Presentations
Machine learning projects previously completed by students
Name | Project Title | Year Completed | Resources |
---|---|---|---|
Achinthya Sreedhar RV College of Engineering, India | Improving conditional probability calculations using kernel methods in Reproducing Kernel Hilbert Space (RKHS) as a part of the Causality Project | 2021 | |
Alexander Parra University of California, Berkeley | Implement a PMML Processor | 2021 | |
Carina Wang American Heritage School, FL, USA | Processing Student Images with Kubernetes on HPCC Systems Cloud Native Platform | 2021 | |
Christopher Connelly North Carolina State University | Ingestion and Analysis of Collegiate Women's Baskteball GPS Data in HPCC Systems and RealBI | 2021 | |
Jefferson Mao Lambert High School, Georgia | Toxicity Detection | 2021 | |
Mara Hubelbank Northeastern University, MA, USA | Causal Inference in Machine Learning | 2021 | |
Mayank Agarwal RV College of Engineering, India | Independence Testing with RCoT : Causal Validation and Discovery for HPCC System Causal Toolkit | 2021 | |
Jack Fields High School Student | Using the GNN Bundle with TensorFlow to train a model to find known faces | 2020 | Tech Talk Presentation, August 2020 |
Matthias Murray Masters in Data Science | Applying HPCC Systems Word Vectors to SEC Filings | 2020 | |
Robert Kennedy Masters in Computer Science | Implement a Multi-node, Multi-GPU Accelerated Deep Learning Algorithm using GNN | 2020 | |
Vannel Zeufack Masters in Computer Science | Implement a Preprocessing Bundle for the HPCC Systems ML Library | 2020 | |
Huafu Hu Masters of Science | Analysing telematics data to support the connected cars industry | 2019 | Tech Talk Presentation, November 2019 Poster |
Christopher Connelly Data Scientist | Cleaning and analysis of collegiate soccer GPS data in HPCC Systems | 2019 | Tech Talk Presentation, November 2019 Poster Community Day presentation |
Robert Kennedy Masters in ComputerScience | Create HPCC Systems on Hyper V | 2019 | Tech Talk Presentation, September 2019 Poster Community Day presentation |
Vannel Zeufack Masters in Computer Science | Develop and assess unsupervised anomaly detection methods using HPCC Systems | 2019 | Tech Talk Presentation, September 2019 Poster |
Farah Alshanik PhD Computer Science | Domain based common words list using high dimensional representation of words | 2019 | Tech Talk Presentation, September 2019 Poster Community Day presentation |
A Suryanarayanan Bachelor of Engineering | Evaluation of machine learning algorithms | 2019 | Tech Talk Presentation, August 2019 Poster |
Muiredach O'Riain BSc in Music and Computing | Machine Learning and the Forensic Applications of Audio Classification: An exploration of the forensic applications of sound classification using Artificial Neural Networks | 2019 | Poster |
Akshar Prasad BTech in Computer Science RV College of Engineering (RVCE), Bengalaru, India | Fraud detection in value based cards | 2019 | Tech Talk Presentation, August 2019 Poster |
Farah Alshanik PhD Computer Science | Text Search Bundle - Implementation of Equivalence Terms | 2018 | |
Nicole Navarro MS in Data Science | Healthcare: High Risk Geo-Social Analysis (Drug Socialization) | 2018 | |
Robert Kennedy Masters in Computer Science | Begin development of a software library (consisting of ECL and Python code) that would provide HPCC Systems distributed neural network training | 2018 | |
Lili Xu PhD in Computer Science | Implementation of CLDA Topic ModelingAlgorithm in ECL-ML | 2018 | |
Soukaina Filali PhD Computer Science | Extension of KMeans clustering and KNN classifier to time series data model in ECL-ML | 2018 | Tech Talk held in September 2018 |
Shah Muhammad Hamdi PhD Computer Science, Data Mining Georgia State University | Dimensionality Reduction and Feature Selection in ECL-ML | 2018 | |
George Mathew PhD student in Computer Science | 2017 | ||
Sarthak Jain PhD student in Computer Science | Documentation Generator for ECL Code | 2017 | |
Lili Xu PhD in Computer Science | Implement a Latent Semantic Analysis Algorithm in ECL | 2017 | |
Vivek Nair PhD in Computer Science | Empower ECL-ML: How to make the HPCC Systems Machine Learning Library easier to use | 2016 | |
Lili Xu PhD in Computer Science | Implement a YinYang K-Means Clustering Algorithm in ECL | 2016 | |
Syed Rahman PhD Student | Implement the Converse Sparse Cholesky Selection Algorithm in ECL | 2016 | |
Shweta Oak Bachelor of Engineering | Analyse which algorithms may provide the best results for the implementation of Non-Negative Matrix Factorisations (NMF) in ECL | 2016 | |
Sarthak Jain PhD student in Computer Science | Add new statistics to the Linear and Logistic Regression Modules | 2015 | |
Syed Rahman PhD Student | Implement the CONCORD Algorithm | 2015 |