Page tree
Skip to end of metadata
Go to start of metadata

This project is available as a student work experience opportunity with HPCC Systems this summer. Curious about other projects we are offering? Take a look at our Ideas List. Find out about the HPCC Systems Summer Internship Program.

The project proposal application period for 2020 summer internships is now open. Please see our list of Available Projects. Contact the project mentor for more information and to discuss your ideas. You may suggest a project idea of your own but it must leverage HPCC Systems in some way. Contact us for support from an HPCC Systems mentor with experience in your chosen project area.

These projects involve the development of machine learning algorithms to extend the existing ECL-ML (ECL Machine Learning) on the HPCC Systems platform, using ECL and the underlying Parallel Block BLAS (PB-BLAS) infrastructure.

While the general purpose is to develop these algorithms mostly in ECL, using other languages that can be embedded in HPCC (Python, Java, Javascript, R, C++) is an acceptable option too. A few functions are C++ functions with a C++/ECL wrapper around them.

The HPCC Systems Machine Learning Library, is a work in progress. However, it already contains many classification, regression and clustering algorithms.

These projects are currently available in this area:

  1. Extend the HPCC Systems ML matrix operation to include complex numbers 
  2. Implement an approximate n-tile algorithm
  3. Word Vectorization
  4. Applying HPCC Systems Word Vectors to SEC Filings
  5. Utilise time series analysis techniques such as Delay-Embedding and Box-Jenkins
  6. Implement a preprocessing bundle to prepare data for use with HPCC Systems ML modules
  7. Linear/Logistic Regression Enhancements
  8. Anomaly Detection Algorithms
  9. Generative Adversarial Networks (GANs)
  10. Adaptive Density Based Clustering
  11. Independence Testing Bundle
  12. Predictive Model Markup Language (PMML) Processor

Objectives for the ML functions

  • To be able to handle large training sets in a timely manner by distributing the training set across the nodes of an HPCC System Thor cluster.
  • To produce statistics that measure the goodness of the fit of models created by the ML functions

Available Resources

Machine learning projects previously completed by students

NameProject TitleYear CompletedResources

Huafu Hu

Masters of Science
Georgia State University

Analysing telematics data to support the connected cars industry2019

Tech Talk Presentation, November 2019 

Poster 

Blog Journal

Christopher Connelly

Data Scientist
North Carolina State University 

Cleaning and analysis of collegiate soccer GPS data in HPCC Systems2019

Tech Talk Presentation, November 2019 

Poster

Community Day presentation
Watch Recording  View Slides 

Blog Journal

Robert Kennedy

Masters in ComputerScience 
Florida Atlantic University

Create HPCC Systems on Hyper V2019

Tech Talk Presentation, September 2019

Poster

Community Day presentation
Watch Recording View Slides

Blog Journal

Vannel Zeufack

Masters in Computer Science
Kennesaw State University

Develop and assess unsupervised anomaly detection methods using HPCC Systems2019

Tech Talk Presentation, September 2019

Poster 

Blog Journal

Farah Alshanik

PhD Computer Science
Clemson University

Domain based common words list using high dimensional representation of words2019

Tech Talk Presentation, September 2019

Poster

Community Day presentation
Watch Recording View Slides 

Blog Journal

A Suryanarayanan

Bachelor of Engineering
RV College of Engineering (RVCE),  Bengalaru, India

Evaluation of machine learning algorithms2019

Tech Talk Presentation, August 2019

Poster 

Blog Journal

Muiredach O'Riain

BSc in Music and Computing
Goldsmiths, University of London

Machine Learning and the Forensic Applications of Audio Classification: An exploration of the forensic applications of sound classification using Artificial Neural Networks2019

Poster

Blog Journal

Akshar Prasad

BTech in Computer Science RV College of Engineering (RVCE), Bengalaru, India

Fraud detection in value based cards 2019

Tech Talk Presentation, August 2019

Poster 

Farah Alshanik

PhD Computer Science
Clemson University

Text Search Bundle - Implementation of Equivalence Terms 2018

Tech Talk held in September 2018

Poster 

Nicole Navarro

MS in Data Science
New College of Florida

Healthcare: High Risk Geo-Social Analysis (Drug Socialization)2018

Tech Talk held in November 2018

Poster

Robert Kennedy

Masters in Computer Science
Florida Atlantic University

Begin development of a software library (consisting of ECL and Python code) that would provide HPCC Systems distributed neural network training2018

Tech Talk held in August 2018 

Community Day Summit 2018 - Watch Recording / View Slides

Poster

Lili Xu

PhD in Computer Science
Clemson University

Implementation of CLDA Topic ModelingAlgorithm in ECL-ML 2018

Tech Talk held in September 2018

Poster 

Soukaina Filali

PhD Computer Science
Georgia State University

Extension of KMeans clustering and KNN classifier to time series data model in ECL-ML2018Tech Talk held in September 2018

Shah Muhammad Hamdi

PhD Computer Science, Data Mining  Georgia State University

Dimensionality Reduction and Feature Selection in ECL-ML2018

Tech Talk held in August 2018

Poster 

George Mathew

PhD student in Computer Science
North Carolina State University

Implement a Gradient Trees Algorithm

2017

Sarthak Jain

PhD student in Computer Science
Northwestern University

Documentation Generator for ECL Code2017

Lili Xu

PhD in Computer Science
Clemson University

Implement a Latent Semantic Analysis Algorithm in ECL2017

Vivek Nair

PhD in Computer Science
North Carolina State University

Empower ECL-ML: How to make the HPCC Systems Machine Learning Library easier to use2016

Lili Xu

PhD in Computer Science
Clemson University

Implement a YinYang K-Means Clustering Algorithm in ECL2016

Syed Rahman

PhD Student
University of Florida

Implement the Converse Sparse Cholesky Selection Algorithm in ECL2016

Shweta Oak

Bachelor of Engineering
Sardar Patel Institute of Technology, India

Analyse which algorithms may provide the best results for the implementation of Non-Negative Matrix Factorisations (NMF) in ECL2016

Sarthak Jain

PhD student in Computer Science
Northwestern University

Add new statistics to the Linear and Logistic Regression Modules2015

Syed Rahman

PhD Student
University of Florida

Implement the CONCORD Algorithm2015





  • No labels