Page tree
Skip to end of metadata
Go to start of metadata

The projects listed here are available as student work experience opportunities with HPCC Systems as part of our summer intern program and Google Summer of Code. 

The project proposal application period for 2020 summer internships is now closed. Check back in the Fall for details about applying to join our 2021 program.

Find out more about the HPCC Systems Summer Intern Program.

  1. Additional Embedded Languages in ECL
    Clojure, Haskell, MariaDB, MatLab, MongoDB, ODBC, Postgres, SAS, Scala, SQL, or suggest one!
  2. Additional external data stores
    Ceph, S3 or suggest one!
  3. DFU Spray from zip/gzip files
    Create a plugin for spraying from a ZIP/GZIP archive without decompressing the content
  4. Implement an IOT pluggable protocol for ROXIE
    Add support for pluggable protocols currently being used in IOT projects
  5. Machine Learning Algorithms on the HPCC Platform
    Data Series Classification
    Implement an approximate n-tile algorithm
    Word Vectorization
    Linear/Logistic Regression Enhancements
    Anomaly Detection Algorithms
    Generative Adversarial Networks (GANs)
    Adaptive Density Based Clustering
    Independence Testing Bundle
    Predictive Model Markup Language (PMML) Processor
    Implement a preprocessing bundle to prepare data for use with HPCC Systems ML modules
    Utilize time series analysis techniques such as Delay-Embedding and Box-Jenkins
  6. System self health check
    Design and implement a tool to provide an overall check to everything is working as expected across components, from a button within ECL Watch
  7. Provide SELinux Policies for the HPCC-Platform installation on Linux environments
    Build SELinux domains for hpccsystems-platform services.
  8. Locking engine to replace DALI - Investigative project
    Research, test and do a POC of a 3rd party inter-machine/process locking engine, for example ZooKeeper, HashiCorp's Consul or other suitable contenders.
  9. Replace existing socket-based message passing interface with an open source package
    Explore if using a different message layer (open-source package, such as ZeroMQ) offers improved performance, robustness and code maintainability
  10. ECL Code Documentation Generator Improvements
    Make major improvements to the ECL Code Documentation Generator (ECLDoc), written in Python.
  11. Execute multiple workflow items in parallel
    Currently the workflow engine executes a single item at a time, and waits for that workflow item to complete before continuing.  Some jobs would benefit significantly if separate persists or independent actions were executed in parallel
  12. Enhance Dagre (Graph Renderer)
    Kickstart the Dagre GitHub project with bug fixes and performance enhancements.
  13. Integrating ECL Watch into VS Code
    Embed ECL Watch pages into VS Code
  14. Process robotics data with HPCC Systems
    Train a model with HPCC Systems Machine Learning (ML) General Neural Network (GNN), which supports TensorFlow, and apply it to new data.
  15. Use AWS EC2 spot instances
    Provision the spot instance so it can be used for HPCC Systems development. 

These projects are still under development and more details will be added soon. If you want to know more about any of these projects, view the associated JIRA issue and please contact Lorraine Chapman or the mentor of the project:

  1. Implement ECL Pretty Print
  2. Implement reference dafilesrv in other languages
  3. Implement a Reverse activity
  4. Incorporating self test code into a bundle
  5. Provide test code for bundles with no self test
  6. VS Code extension for DESDL and other languages
  7. Add Arrow support to dafilesrv
  8. Add ORC support to HPCC Systems 
  9. Using HPCC Systems as a data lake for the Deep Cloud platform
  10. Applying HPCC Systems Word Vectors to SEC Filings 

  • No labels