The projects listed here are available as student work experience opportunities with HPCC Systems as part of our summer intern program and Google Summer of Code.
The project proposal application period for 2020 summer internships is now closed. Check back in the Fall for details about applying to join our 2021 program.
Find out more about the HPCC Systems Summer Intern Program.
- Additional Embedded Languages in ECL
Clojure, Haskell, MariaDB, MatLab, MongoDB, ODBC, Postgres, SAS, Scala, SQL, or suggest one!
- Additional external data stores
Ceph, S3 or suggest one!
- DFU Spray from zip/gzip files
Create a plugin for spraying from a ZIP/GZIP archive without decompressing the content
- Implement an IOT pluggable protocol for ROXIE
Add support for pluggable protocols currently being used in IOT projects
- Machine Learning Algorithms on the HPCC Platform
Data Series Classification
Implement an approximate n-tile algorithm
Linear/Logistic Regression Enhancements
Anomaly Detection Algorithms
Generative Adversarial Networks (GANs)
Adaptive Density Based Clustering
Independence Testing Bundle
Predictive Model Markup Language (PMML) Processor
Implement a preprocessing bundle to prepare data for use with HPCC Systems ML modules
Utilize time series analysis techniques such as Delay-Embedding and Box-Jenkins
- System self health check
Design and implement a tool to provide an overall check to everything is working as expected across components, from a button within ECL Watch
- Provide SELinux Policies for the HPCC-Platform installation on Linux environments
Build SELinux domains for hpccsystems-platform services.
- Locking engine to replace DALI - Investigative project
Research, test and do a POC of a 3rd party inter-machine/process locking engine, for example ZooKeeper, HashiCorp's Consul or other suitable contenders.
- Replace existing socket-based message passing interface with an open source package
Explore if using a different message layer (open-source package, such as ZeroMQ) offers improved performance, robustness and code maintainability
- ECL Code Documentation Generator Improvements
Make major improvements to the ECL Code Documentation Generator (ECLDoc), written in Python.
- Execute multiple workflow items in parallel
Currently the workflow engine executes a single item at a time, and waits for that workflow item to complete before continuing. Some jobs would benefit significantly if separate persists or independent actions were executed in parallel
- Enhance Dagre (Graph Renderer)
Kickstart the Dagre GitHub project with bug fixes and performance enhancements.
- Integrating ECL Watch into VS Code
Embed ECL Watch pages into VS Code
- Process robotics data with HPCC Systems
Train a model with HPCC Systems Machine Learning (ML) General Neural Network (GNN), which supports TensorFlow, and apply it to new data.
- Use AWS EC2 spot instances
Provision the spot instance so it can be used for HPCC Systems development.
These projects are still under development and more details will be added soon. If you want to know more about any of these projects, view the associated JIRA issue and please contact Lorraine Chapman or the mentor of the project:
- Implement ECL Pretty Print
- Implement reference dafilesrv in other languages
- Implement a Reverse activity
- Incorporating self test code into a bundle
- Provide test code for bundles with no self test
- VS Code extension for DESDL and other languages
- Add Arrow support to dafilesrv
- Add ORC support to HPCC Systems
- Using HPCC Systems as a data lake for the Deep Cloud platform
- Applying HPCC Systems Word Vectors to SEC Filings