Investigate Frameworks and Best Practices for HPCC Systems Cloud Native

This project is already taken and is no longer available for the 2023 HPCC Systems Intern Program

This project is available as a student work experience opportunity with HPCC Systems. Curious about other projects we are offering? Take a look at our Ideas List.

Student work experience opportunities also exist for students who want to suggest their own project idea. Project suggestions must be relevant to HPCC Systems and of benefit to our open source community. 

Find out about the HPCC Systems Summer Internship Program.

Project Description

The introduction of cloud native support for HPCC Systems has opened up a plethora of opportunities for those students looking to learn DevOps skills. One such opportunity is the exploration of best practices for adopting concepts such as Infrastructure as a Code (IaC) for HPCC Systems deployments in the cloud.

Aligned with this context, the goal of this project is to develop a cloud-agnostic opinionated HPCC Systems module using terraform that runs on all the major cloud providers, namely Azure, AWS and GCP. During the development process, you should consider elements such as cloud costs and system performance as part of building these modules. System logging and metrics are also essential to the users running Ecl workload so that engineers can quickly diagnose problems.

Upon successful completion of this work, it is expected that you will be able to:

  • Have a good working understanding of terraform, and terraform module concepts.

  • Have an excellent knowledge of fundamental differences between the three major cloud providers.

  • Understand what goes into building an opinionated module that balances ease of use, performance, and logging.

  • Optimize performance by striking a balance between Virtual Machine instance types and cost.

  • Learn how to measure performance by running an Ecl Terasort on a cluster.

  • Compare the different storage types each cloud provider offers and their drivers and effectively use them.

  • Have a working understanding of Kubernetes.

  • Have a working knowledge of Helm.

  • Have a working knowledge of Kubernetes charts.

  • Have a working knowledge of docker registries, such as Dockerhub and JFrog.

  • Code in Yaml.

Completion of this project involves:

  • Provide a simple and opinionated way to build a standard HPCC Systems cluster using a common set of services. This standard pattern should reduce the cognitive load on teams who need to run these clusters.

  • The module should support deploying in an existing network (VPC, VNET).

  • The module should support using an existing storage solution via an import feature.

  • The module should support multiple input values files via an argument.

  • Logging and monitoring are crucial to running and diagnosing cluster-related issues, so the module should use Fluent, Loki, Prometheus, and Grafana to gather metrics and alerts.

Mentor

Wayne Carty
Wayne.Carty@lexisnexisrisk.com 

Backup Mentor: Godson Fortil
Godji.Fortil@lexisnexisrisk.com 

Skills needed
  • Ability to manage HPCC Systems deployments (guidance will be provided). Knowledge of ECL is not a requirement since it should be possible to re-use existing code with minimal changes for this purpose. Links are provided below to our ECL training documentation and online courses should you wish to become familiar with the ECL  language.

  • Cloud fundamental concepts

  • Basic programming knowledge and shell scripting

  • GitHub as a user and a bit as a developer

Other resources

All pages in this wiki are subject to our site usage guidelines.