Interfacing your own suggested external datastore with ECL



This project is available as a student work experience opportunity with HPCC Systems. Curious about other projects we are offering? Take a look at our Ideas List.

Student work experience opportunities also exist for students who want to suggest their own project idea. Project suggestions must be relevant to HPCC Systems and of benefit to our open source community. 

Find out about the HPCC Systems Summer Internship Program.

Submitting your own external datastore project

Not only are we very happy to accept suggestions/proposal for additional external datastores not included in our Ideas List, we encourage it!

Please read all the information below to make sure that you can commit to our requirements. Feel free to email the mentor to discuss your idea.

Project Description

The HPCC Systems platform currently supports embedding Python, Java, Javascript, R, MySQL and C++ and Cassandra code.  We are also looking at using similar techniques to provide simple interfaces to some value stores like Ceph, S3 and Kafka (though they may not look like embedded languages).

The goal of this project is to support the embedding of your chosen language database queries within ECL code running on HPCC Systems. This will allow native reads and writes directly from and to the Object Store, to reduce the extra latency currently created by the requirement to move the data into the internal distributed filesystem prior to processing. 

One of the challenges of this project, is to address how an external key-value store interacts with a distributed thor query so that the external datastore acts like a distributed file read by each node in the thor or where only a portion of a result is written. 

Additional languages are added to the system via a “plugin” system such as MySQL (available here), or Python (available here). Use these as examples of the sort of work required. Each completed plugin is considered to be a new feature addition to the HPCC System Platform.

Please make sure to justify and highlight in your proposal why your chosen datastore is a good use case for HPCC Systems, including which big data community would benefit from it.

Completion of this project involves:

  • Investigating the API for calling your chosen language from C/C++.

  • Creating a simple wrapper for scalar values between the ECL embed API and your chosen language API using one of the existing embed plugin implementations as an example.

  • Extending the simple wrapper to handle structured data.

  • In parallel with the above, developing test cases for the plugin that include coverage of all data types both passed in and returned, including multi-threaded access from the ECL side. This includes testing the performance and throughput of the system for some examples that approximate to real-world usage.

  • A complete GitHub project with code and documentation.

  • A blog, a recorded presentation, and a poster artifact about your project (see examples from previous years here).

By the mid term review we would expect you to have:

  • Implemented a simple example that passes and returns scalar values (which are usually much simpler than passing/returning structures).

Mentor

Jack Del Vecchio 
Jack.DelVecchio@lexisnexisrisk.com

Backup Mentor: TBD


Skills needed
  • Ability to code in C++.

  • Ability to build and test the HPCC system (guidance will be provided).

  • Knowledge of your chosen language sufficient to write and run test cases.

  • Ability to write test code. 

Deliverables

Midterm

  • A simple example that passes and returns scalar values.

End of project

  • A plugin that supports interfacing to your chosen language from ECL, that will implement the ECL embedded language API and make calls to your chosen language embedded via its C/C++ api (assuming it has one!).

  • Test cases demonstrating the correct behavior and performance of the plugin.

  • Documentation of how datatypes and structures in ECL are mapped to your chosen language.

Other resources

All pages in this wiki are subject to our site usage guidelines.