12 students joined our intern program in 2021. Our students presented about their projects to the team during the year and 9 of them entered our 2021 Poster Contest held at our virtual HPCC Systems Community Day Summit in October 2021.
Due to COVID-19 all internships were completed remotely.
Meet the Class of 2021
RV College of Engineering, India
|Improving conditional probability calculations using kernel methods in Reproducing Kernel Hilbert Space (RKHS) as a part of the Causality Project
|Conditional Probability is a key enabling technology for Causal Inference. For real valued variables, calculating conditional probabilities is particularly challenging because they can take on an infinite set of values. With the increase in conditional dimensions, the data appears sparser and sparser making it difficult to derive accurate results. After looking at various ways of modelling conditional probabilities, we found that using RKHS kernel methods, it was possible to estimate the density and cumulative density of conditional probabilities with a single conditioning variable.
Dr Shobha G
Prof Jyothi Shetty
|Implement a PMML Processor
The aim of this project was to implement a Predictive Model Markup Language (PMML) Processor using ECL and providing a user friendly interface.
The converter works in both ways for simple basic (and multiple) Linear Regression machine learning models. The converter takes in a .pmml/.xml file and returns a .ecl file, containing the code needed to make predictions. Conversely, the converter also takes in a .ecl file and compiles it, turning it into a PMML model in the process.
This work makes it easier for users to convert files and provides support for other algorithms, such as Logistic Regression, Random Forests, Neural Networks, etc.
Marjory Stoneman Douglas High School, FL, USA
|An Ingress is an object that allows access to Kubernetes services from outside the Kubernetes cluster. Ingress is made up of an Ingress object and the Ingress Controller. An Ingress Controller is the implementation of the Ingress. In this project, two Ingress implementations, HAProxy and Nginx were examined on Azure environment. These two Ingress controllers both use the in-cluster Ingress solutions, where load balancing is performed by pods within the cluster. My works explore the different setup used to configure Ingress features through annotations and Kubernetes ingress specifications.
RV College of Engineering, India
|Improvements on HSQL: A SQL-like language for HPCC Systems
|Big Data has become an important field, and there is a steep learning curve to getting used to handling Big Data, especially in distributed systems. HSQL for HPCC Systems is a solution that is developed for allowing users to get used to its architecture and the ECL (Enterprise Control Language) language with which it primarily operates. HSQL aims to provide a seamless interface for data science developers to use, for working with data. It is designed to work in conjunction with ECL, the primary programming language for HPCC Systems, and should prove to be easy to work with and robust for general purpose analysis.
American Heritage School, FL, USA
|Processing Student Images with Kubernetes on HPCC Systems Cloud Native Platform
|In order to foster a safe learning environment, measures to bolster campus security have emerged as a top priority around the world. The developments from my internship will be applied to a tangible security system at American Heritage High School (AHS). Processing student images on the HPCC Systems Cloud Native Platform and evaluating the HPCC Systems Generalized Neural Network (GNN) bundle on cloud ultimately facilitated a model’s classification of an individual as “AHS student” or “Not an AHS student”. This will allow a person to receive confirmation from the robot that they are in the student database and retrieve information as part of a larger, interactive security feature.
North Carolina State University
Ingestion and Analysis of Collegiate Women's Baskteball GPS Data in HPCC Systems and RealBI
|In the past NC State Strength and Conditioning has worked with HPCC Systems to create solutions for taking different data streams and bringing them together for a comprehensive analysis to improve athlete wellbeing and performance. Here you will see some solutions using HPCC Systems and RealBI to provide insight from data collected with the NC State Women's basketball team. You will see some differences from working with a Bare Metal environment to a Kubernetes environment. See how these solutions can help our understanding of this data to provide better service to these student athletes.
|Continue Novel COVID-19 Tracker and Global Map Using HPCC Systems ECL Watch
HPCC Systems contains an active Covid-19 portal as a part of our web footprint. Connecting the major Covid-19 databases together with Airport Data provides a number of possible applications of this tool such as, data analysis, public safety applications, educational resources and as a traveling tool. As a travel tool, it could provide the ability to view COVID-19 data and metrics alongside a user's input Itinerary. An interactive map, colored coded by vaccine percentages and other data such as IATA codes, airport information, airport locations, confirmed cases, school closing data, contagion risk percentages, deaths, gathering restrictions and mask restrictions etc. All this data can be dynamically populated in the interactive map.
|Not only was the creation of the internet the largest technological breakthrough of the 20th century, it also happened to become a hidden double-edged sword. The internet has allowed us to access information and communicate at unprecedented levels, across the globe. Yet, this comes at an enormous cost. The human cost. Hidden behind computer screens, we enjoy a security blanket of anonymity, which emboldens some to say and do things that are labeled as disturbing in a public setting. By creating a Toxicity Detection Platform, I aim to curb this harassment and provide a healthier web environment for everyone.
Northeastern University, MA, USA
|Causal Inference in Machine Learning
|The HPCC Systems platform is dedicated to research and development within the groundbreaking field of causal statistics, which seeks to understand and model the complex causal mechanisms of our everyday lives. This project focused on designing the interventional and counterfactual modules of this platform; these algorithms tease apart the structure of the input data to get to the how and why of the relationships which link them together. Moreover, this project demonstrates multiple use cases on synthetically generated data, identifies real-world datasets for exploration, and outlines areas for future extension of the platform extracted from cutting-edge causality research.
RV College of Engineering, India
|Independence Testing with RCoT : Causal Validation and Discovery for HPCC System Causal Toolkit
|The new science of Causality promises to open new frontiers in Data Science and Machine Learning, but requires an accurate model of the causal relationships between variables. This causal model takes the form of a Directed Acyclic Graph (DAG). Nature provides a few subtle cues to the structure of the causal model, the most important of which is the independencies or conditional independencies between variables. These independencies allow us to test a causal model to determine if it is consistent with the observed data, and in some cases to discover the causal model from data alone.
Northview High School GA, USA
Apply Docker Image Build and Kubernetes Security Principles
|With cybersecurity attacks becoming more prevalent in the United States every year, organizations are constantly looking for ways to improve the security outlook of their platforms. Recently, HPCC Systems has begun transitioning to a cloud-native platform in which they use Docker containers managed by Kubernetes to store and manage data. With this new change, it is of utmost importance that HPCC Systems has a secure cloud environment since they are using it to manage secure data from other companies.
|Use Azure Spot Instance with HPCC Systems for Cost Optimization
|Minimizing the cost of setting up cloud infrastructure is very important for all companies. Azure spot instances can provide great cost savings for cloud infrastructure setup. Azure Spot Instances are unused computing resources (virtual machines) azure has. Azure gives it for a lower price compared to normal virtual machines. It is found that Azure gives these instances at a rate that can be as low as 90% below the normal instance. The price can vary based on region and size. In this project, we try to analyze different aspects related to the use of Azure Spot Instance with HPCC Systems.
Profile of our intern program in 2021
- 12 students - 4 High School, 6 Undergraduates, 1 Masters and 1 Researcher
- Global and inclusive program, with three students located in Asia (India) and 1 international student studying in the USA.
- 2 returning students
- All remote working
- Spread of projects: 3 Cloud, 7 Machine Learning, 2 platform related
- 12 mentors involved including 2 academic mentors
HPCC Systems platform and cloud related projects
- Use Azure Spot Instance with HPCC Systems for Cost Optimization
- Apply Docker Image Build and Kubernetes Security Principles
- Improvements on HSQL: A SQL-like language for HPCC Systems *
- Ingress Configuration
- Continue Novel COVID-19 Tracker and Global Map Using HPCC Systems ECL Watch
Machine learning related projects
- Independence Testing with RCoT : Causal Validation and Discovery for HPCC System Causal Toolkit
- Toxicity Detection *
- Ingestion and Analysis of Collegiate Women's Baskteball GPS Data in HPCC Systems and RealBI
- Processing Student Images with Kubernetes on HPCC Systems Cloud Native Platform *
- Improving conditional probability calculations using kernel methods in Reproducing Kernel Hilbert Space (RKHS) as a part of the Causality Project
- Implement a PMML Processor
- Causal Inference in Machine Learning
* Projects suggested by students themselves