HPCC Systems Intern Program - Class of 2018

Find out more about the HPCC Systems Summer Intern Program.

10 students joined our intern program in 2018. Our students presented about their projects at our tech talk webcasts during the year and some were able to enter our 2018 Poster Contest held at our HPCC Systems Community Day Summit held in Atlanta, October 2018.

Meet the students

Nicole Navarro: MS in Data Science, New College of Florida
Mentors: Jo Prichard and team
Project: Healthcare: High Risk Geo-Social Analysis (Drug Socialization)
Description: Identifying geographical areas that have the highest opioid social cohesivity, which might make them more suitable for intervention or disruption.
Resources: Hear Nicole speak about her project at our Tech Talk held in November 2018. Nicole was the second place winner of our Technical Poster Contest 2018. See Nicole's poster.

Everett Matthew Upchurch Butler: BS in Information Technology, Kennesaw State University
Mentor: Roger Dev
Project: Provide a Standard HPCC Systems ECL Math Library
Description: Implement a number of functions which estimate one or more parameters for a specific distribution. 
Resources:  See Jira issue HPCC-986. Hear Matt speak at our Tech Talk held in April 2018 and see the poster he presented at our Technical Poster Contest 2018

Saminda Wijeratne: MSc in Computational Science and Engineering, Georgia Institute of Technology
Mentors: Mark Kelly and Jake Smith
Project: MPI Proof of concept
Description: An evaluation of the potential performance benefits that might be gained by switching to use what is now the industry standard high performance cluster computing message passing software. 
JIRA issue HPCC-8706
Resources: Hear Saminda speaking about his project at our Tech Talk held in August 2018. Saminda was also the first place winner of our Technical Poster Contest 2018. See Saminda's poster.

Jayashree Ukkinagatti: MTech in Computer Science, RV College of Engineering, India 
Mentors: 
Anthony Fishbeck and Rodrigo Pastrana
Project: 
Continuous Integration of ROXIE query / data deployments using Jenkins 
Description: 
Allow the automated building, deployment and testing of ECL code. 
JIRA issue HPCC-19698
Resources: Hear Jayashree speak about her project at our Tech Talk in November 2018.

Aramis Tanelus: American Heritage School of Boca/Delray, Florida
Mentors: David DeHilster,  TaiowaDonovan and XiaomingWang
Project: APIs for HPCC Systems Data Ingestion for Common Robot Sensors
Description: Create APIs for ingesting data into HPCC Systems for a number of commonly used robot sensors which provide information about the position of motors, robot movement, vision processing via cameras and rangefindersJIRA issue HPCC-19699
Resources: Hear Aramis speaking about his project at our Tech Talk held in August 2018, view his slides and see the poster he entered in our Technical Poster Contest 2018.

Robert Kennedy: Masters in Computer Science, Florida Atlantic University
Mentors: Tim Humphrey and Dr Taghi Khoshgoftaar, Florida Atlantic University
Project: Begin development of a software library (consisting of ECL and Python code) that would provide HPCC Systems distributed neural network training
Description: Training a deep learning neural network takes a very long time. Robert’s project attempts to distribute the deep learning training algorithm on an HPCC Systems cluster speeding up the training time. 
JIRA issue ML-415
Resources: Hear Robert speaking about his project at our Tech Talk held in August 2018 and on the main stage at our Community Day Summit 2018 ( Watch Recording / View Slides). Robert was also the third place winner of our Technical Poster Contest 2018 at the same event. See Robert's poster.

Lili Xu: PhD Computer Science, Clemson University
Mentors: Arjuna Chala and Professor Amy Apon, Clemson University
Project: Implementation of CLDA Topic Modeling Algorithm in ECL-ML 
Description: The application of clustering to the topic modeling problem which gives Clustering LDA (CLDA). Topic modeling is the process of determining the topics covered by a collection of documents and the mix of topics in each document. JIRA issue ML-416
Resources: Hear Lily speaking about her project at our Tech Talk held in September 2018 and see the poster she entered in our Technical Poster Contest 2018.

Farah Al-Shanik: PhD Computer Science, Clemson University
Mentors: Kevin Wilmoth, David Miller and Professor Amy Apon
Project: Implement equivalence terms for the Text Search Bundle
Description: How similar does a term in a search request need to be to a term in the document to be considered a term match? 
Farah’s project provides the ability to automatically create equivalents for initialisms and acronyms.  Her project also provides a means of applying a table of equivalents and the attributes to build that table from an open source thesaurus such as Moby. 
JIRA issue TS-9
Resources: Hear Farah speaking about her project at our Tech Talk held in September 2018 and see the poster she entered in our Technical Poster Contest 2018.

Soukaina Filali: PhD Computer Science, Georgia State University 
Mentor: Roger Dev
Project: Fraud Detection on Transactional Data using a Time Series Mining Approach
Description: The project consists of detecting fraudulent pre-paid cards from non-fraudulent ones using mined patterns on their respective historical bank transactions data. There are numerous types of card programs, each of which comes with different fraud risk levels. Every fraud category has representative patterns that a human manually monitors on a daily basis. The goal here is to combine the domain expert engineered features with time series shapelets mining techniques to provide an automated fraud detection solution, which can potentially help in early fraud detection.
Resources: Hear Soukaina speaking about her project at our Tech Talk held in September 2018.

Shah Muhammad Hamdi: PhD Computer Science (Data Mining),  Georgia State University
Mentor: Roger Dev
Project: Dimensionality Reduction and Feature Selection in ECL-ML
Description: Implement several supervised and unsupervised filter-based feature selection algorithms and implement a new filter-based feature selection algorithm, which selects the features using both supervised and unsupervised techniques. Hamdi project began with the implementation of a parallelized version of Principal Component Analysis (PCA). 
JIRA issue ML-419
Resources: Hear Hamdi speaking about his project at our Tech Talk held in August 2018 and see the poster he entered in our Technical Poster Contest 2018.

Profile of our intern program in 2018

  • 10 students - 1 high school student, 1 undergraduate, 4 Masters, 4 PhD
  • Global and inclusive program, with one student located in India and international students from Jordan, Sri Lanka and China, studying in the USA. 5 female and 5 male students.
  • 2 internships completed and 2 students hired by LexisNexis
  • 8 students completing projects over the summer
  • 5 remote workers (working from home) including one student located in India and 5 office based in LexisNexis offices in Alpharetta, GA and Boca Raton, FL
  • Spread of projects: 5 Machine Learning, 4 HPCC Systems platform, 1 business related project
  • 15 mentors involved - 5 new HPCC Systems mentors, 2 academic mentors

Business related project

  • Healthcare: High Risk Geo-Social Analysis (Drug Socialization) **

HPCC Systems platform related projects

Machine learning related projects

  • Begin development of a software library (consisting of ECL and Python code) that would provide HPCC Systems distributed neural network training  *
  • Implementation of CLDA Topic Modeling Algorithm in ECL-ML *
  • Implement equivalence terms for the Text Search Bundle 
  • Extension of KMeans clustering and KNN classifier to time series data model in ECL-ML  *
  • Dimensionality Reduction and Feature Selection in ECL-ML *

*   Projects suggested by students themselves
**  Projects completed earlier in 2018

All pages in this wiki are subject to our site usage guidelines.