Academic publications supported by the HPCC Systems project

The following papers and publications have been produced over the years by professors and students who have collaborated with us as part of our academic program and a number of LexisNexis Risk Solutions Group employees. We are proud to have supported our academic partners and colleagues in their research.

Every year, we welcome a number of students on to the HPCC Systems summer intern program and we are equally proud to see that some of them have contributed to published research. Some students have also presented their research as part of our Technical Poster Presentation Contest, held at our annual HPCC Systems Community Summit.

Have you contributed to a paper or publication supported by HPCC Systems in some way? Or do you have a publication coming soon? Please tell us about it and contact us to tell us more about your research.

Are you using HPCC Systems as part of your project? Share your story with us.


YearTitleAuthor(s)Accredited Organisation
2024



Synthesizing class labels for highly imbalanced credit card fraud detection dataRobert Kennedy, Flavio Villanustre, Taghi M. Khoshgoftaar, and Zahra SalekshahrezaeeFlorida Atlantic University, USA

Analyzing Blockchain Data to Detect Bitcoin Addresses Involved in Illicit Activities Using Anomaly DetectionSarthak Sharan, Divye Sancheti, Dr Shobha G, Jyoti Shetty, Arjuna Chala, Hugo WatanukiRashtreeya Vidyalaya College of Engineering, India
2023



Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learningRobert Kennedy, Zahra Salekshahrezaee, Flavio Villanustre, and Taghi M. Khoshgoftaar Florida Atlantic University, USA

Emotions detection in social media postsPedro Lima Rodrigues, Renato de Oliveira Moraes, Hugo Watanuki, David de HilsterUniversity of São Paulo, Brazil

Auto detection of field level dependencies in data workflow on a Distributed Platform (in press)Surya, Sumanth Hedge, Jyoti Setty, Shoba G, Dan CamperRashtreeya Vidyalaya College of Engineering, India

Estimating the Number of Clusters for the K-Means Algorithm in a Big Data ContextBruno Costa, Renato de Oliveira Moraes, Hugo WatanukiUniversity of São Paulo, Brazil

Causal Inference and Conditional Independence Testing with RCoTMayank Agarwal , Abhay H. Kashyap , G. Shobha , Jyothi Shetty , and Roger Dev Rashtreeya Vidyalaya College of Engineering, India

Analysis of the Surface Water Quality in the State of Karnataka using Distributed PlatformShravya Dasu, Shobha G, Jyothi ShettyRashtreeya Vidyalaya College of Engineering, India

HPCC Systems log monitoring in the cloud (in Brazilian Portuguese)Nathália Ribas, PAtricia Plentz, Alysson Oliveira, Hugo WatanukiUniversidade Federal de Santa Catarina, Campus Florianópolis, Brazil

Illicit Activity Detection in Bitcoin Transactions using Timeseries AnalysisRohan Maheshwari, Sriram Praveen V A, Shobha G, Jyoti Shetty, Arjuna Chala, Hugo WatanukiRashtreeya Vidyalaya College of Engineering, India
2022



Performance Skew Prediction in HPCC Systems

Ambu Karthik, Harsh Mishra, S Jayanth, G Shobha, Jyoti ShettyRashtreeya Vidyalaya College of Engineering, India

Optimal lockdown policy for vaccination during COVID-19 pandemic

Yuting Fu, Hanqing Jin, Haitao Xiang, Ning WangOxford University, UK
2021



Big Data and Logistic Regression Applied to Analysis of Loan Requests (in Brazilian Portuguese)

André Fontanez Bravo, Renato de Oliveira Moraes, Hugo Martinelli WatanukiUniversity of São Paulo, Brazil

VR Supermarket: a Virtual Reality Online Shopping Platform with a Dynamic Recommendation System

Deeksha Shravani, Prajwal Y R, Prajwal V Atreyas, Shobha GRashtreeya Vidyalaya College of Engineering, India

Design and Implementation of HSQL: A SQL-like language for Data Analysis in Distributed Systems

Anurag Singh Bhadauria, Atreya BainJyoti ShettyShobha GArjuna ChalaJeremy Clements

Rashtreeya Vidyalaya College of Engineering, India

Parallelizing filter-and-verification based exact set similarity joins on multicores

Fabian Fier, Johann-Christoph Freytag

Humboldt University of Berlin

Scaling Up Set Similarity Joins Using a Cost-Based Distributed-Parallel FrameworkFabian Fier, Johann-Christoph FreytagHumboldt University of Berlin

Implementation of generative adversarial networks in HPCC systems using GNN bundle

Ambu Karthik, Jyoti Shetty, Shobha G., Roger DevRashtreeya Vidyalaya College of Engineering, India

Hybrid Density-based Adaptive Clustering using Gaussian Kernel and Grid Search

Varsha R Jenni, Akhil Dua, G Shobha, Jyoti Shetty, Roger Dev

Rashtreeya Vidyalaya College of Engineering, India

Massively scalable density based clustering (DBSCAN) on the HPCC Systems big data platformYatish HR, Shubham Milind Phal, Tanmay Sanjay Hukkeri, Lili Xu, Shobha G, Jyoti Shetty, Arjuna ChalaRashtreeya Vidyalaya College of Engineering, India

Modeling and tracking Covid-19 cases using Big Data analytics on HPCC Systems platform

Flavio Villanustre, Arjuna Chala, Roger Dev, Lili Xu, Jesse Shaw, Borko Furst, Taghi KhoshgoftaarFlorida Atlantic University

Orquestração de Aplicações de Computação de Alta Performance em Ambientes Cloud Conteinerizados (in Brazilian Portuguese)

Lucas Varella, Patricia Plentz, Hugo Watanuki, Artur Baruchi

Universidade Federal de Santa Catarina, Campus Florianópolis, Brazil

Análise massiva de dados na gestão pública: Uma proposta para identificação de outliers no cadastro de imóveis da prefeitura de São Paulo (in Brazilian Portuguese)

Luiz Fernando Cavalcante Silva, Renato de Oliveira Moraes, Hugo Martinelli Watanuki, Leandro Ramos da Silva

University of São Paulo, Brazil
2020

An evaluation of mathematical models for the outbreak of COVID-19

Ning Wang, Yuting Fu, Hu Zhang, Huipeng ShiOxford University

Survey on RNN and CRF models for De-identification of Medical Free TextJoffrey Leevy, Taghi Khoshgoftaar, Flavio VillanustreFlorida Atlantic University

Massively Scalable Image Processing on the HPCC Systems Big Data Platform

Tanmay Sanjay Hukkeri, Shobha G, Shubham Milind Phal, Jyothi Shetty, Yatish H R, Naweed Mohammed

Rashtreeya Vidyalaya College of Engineering, India

Parallelizing Filter-Verification Based Exact Set Similarity Joins on MulticoresFabian Fier, Johann-Christoph FreytagHumboldt University of Berlin
2019

Machine Learning Techniques to Detect Fraud in Credit Cards on the HPCC Systems Platform

Akshar Prasad, Roger Dev, G Shobha, Jyothi ShettyRashtreeya Vidyalaya College of Engineering, India

Machine Learning Techniques to Understand Partial and Implied Data Values for Conversion of Natural Language to SQL Queries on HPCC Systems

Akshar Prasad, G Shobha, N Deepamala, Sourabh S Badhya, YS Yashwanth, Shetty RohanRashtreeya Vidyalaya College of Engineering, India

Design and Implementation of Octave Plugin for HPCC Systems

K R Sathvik, G Shobha, Jyoti Shetty, Dan CamperRashtreeya Vidyalaya College of Engineering, India

Design and implementation of Machine Learning Evaluation Metrics on HPCC SystemsA. Suryanarayanan, Arjuna Chala, Lili Xu, G Shobha, Jyoti Shetty, Roger DevRashtreeya Vidyalaya College of Engineering, India

A Parallel and Distributed Stochastic Gradient Descent Implementation Using Commodity ClustersRobert K.L. Kennedy, Taghi M. Khoshgoftaar, Flavio Villanustre, Tim HumphreyFlorida Atlantic University

Random Forest Implementation and Optimization for Big Data Analytics on LexisNexis’s High Performance Computing Cluster Platform Victor Herrera Cordova, Taghi Khoshgoftaar, Borko Furht, Flavio VillanustreFlorida Atlantic University

A Survey of Machine Learning Algorithms Available on the HPCC Systems Coming soonFlorida Atlantic University

Unsupervised annotation of phenotypic abnormalities via semantic latent representations on electronic health recordsJingqing Zhang, Xiaoyu Zhang, Kai Sun, Xian Yang, Chengliang Dai, Yike GuoImperial College, London

Integrating Semantic Knowledge to Tackle Zero-shot Text ClassificationJingqing Zhang, Piyawat Lertvittayakumjorn, Yike GuoImperial College, London

Design and Development of IoT Plugin for HPCC Systems

K.S. Amogh Vardhan, Manjunath Jakaraddi, Dr.Shobha G, Jyoti Shetty, Arjuna Chala, Dan CamperRashtreeya Vidyalaya College of Engineering, India

Data Skew Profiling using HPCC Systems

Harsh Mishra, Jayanth S, Jyoti Shetty, Shobha G, Arjuna Chala, Dan CamperRashtreeya Vidyalaya College of Engineering, India

Massively Scalable Parallel KMeans on the HPCC Systems Platform

Lili Xu, Amy Apon, Roger Dev, Flavio Villanustre, Arjuna ChalaClemson University
2018

Dest-ResNet: a Deep Spatiotemporal Residual Network for Hotspot Traffic Speed PredictionBinbing Liao, Jingqing ZhangImperial College, London

Deep Sequence Learning with Auxiliary Information for Traffic PredictionBinbing LiaoJingqing ZhangChao WuDouglas McIlwraithTong ChenShengwen YangYike GuoFei WuImperial College, London

HPCC BenchmarkingRushikesh GhatpandeNorth Carolina State University

Finding better active learners for faster literature reviewsZhe Yu, Nicholas A. Kraft, Tim MenziesNorth Carolina State University

Security Alerting and Event Management in the Era of Machine Learning: Our Experience in the IndustryFlavio VillanustreLexisNexis Risk Solutions

Cervical Cancer Risk Factors: Exploratory Analysis using HPCC SystemsOmosalewa Itauma, Itauma ItaumaSouthern New Hampshire University/Wayne State University
2017

Representativeness of latent dirichlet allocation topics estimated from data samples with application to common crawlYuheng Du, Alexander Herzog, Andre Luckow, Ramu Nerella, Christopher Gropp, Amy AponClemson University

ECL-watch: A big data application performance tuning tool in the HPCC systems platformLili Xu Edin Muharemagic Amy AponClemson University

Large-scale distributed L-BFGS

Maryam M. Najafabadi, Taghi M. KhoshgoftaarFlavio Villanustre, John Holt

Florida Atlantic University

Learning Text to Image Synthesis with Textual Data AugmentationHao DongJingqing ZhangDouglas McIlwraithYike GuoImperial College, London

TensorDB: Database Infrastructure for Continuous Machine Learning

F. Liu, A. Oehmichen, J. Zhang, K. Sun, H. Dong, Y.Mo, Y. Guo

Imperial College, London

The Deep Poincare Map: A Novel Approach for Left Ventricle Segmentation

Yuanhan Mo, Fangde Liu, Douglas McIlwraith, Guang Yang, Jingqing Zhang, Taigang He3, and Yike Guo

Imperial College, London

Unsupervised deep kernel for high dimensional dataYing Xie, Linh Le, Jie HaoKennesaw State University

A sentiment-change-driven event discovery systemLili Zhang, Ying Xie, Guoliang LiuKennesaw State University

Trilogy: Data placement to improve performance and robustness of cloud computingChin-Jung Hsu, Vincent W. Freeh, Flavio VillanustreNorth Carolina State University
2016

Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)Christopher GroppClemson University

Automated cluster provisioning and workflow management for parallel scientific applications in the cloudBrandon Posey, Christopher Gropp, Alexander Herzog, Amy AponClemson University

Big Data Technologies and ApplicationsBorko Furht, Flavio VillanustreFlorida Atlantic University

Introduction to Big DataBorko Furht, Flavio VillanustreFlorida Atlantic University

Social Network Analytics: Hidden and Complex Fraud SchemesBorko Furht, Flavio VillanustreFlorida Atlantic University

Modeling Ebola Spread and Using HPCC/KEL System

Jesse Shaw, Flavio Villanustre, Borko Furht, Ankur Agarwal, Abhishek Jain

Florida Atlantic University

The HPCC/ECL Platform for Big Data

Anthony M. Middleton, David Alan Bayliss, Gavin Halliday, Arjuna Chala, Borko Furht

Florida Atlantic University

Deep Learning Techniques in Big Data Analytics

Maryam M. Najafabadi, Flavio Villanustre, Taghi M. Khoshgoftaar, Naeem Seliya, Randall Wald, Edin Muharemagic

Florida Atlantic University

Visualization of big high dimensional data in a three dimensional spaceYing Xie, Jing He, Pooja Chenna, Linh LeKennesaw State University

Graph Processing with Massive Datasets: A Kel Primer

David Bayliss, Flavio Villanustre

LexisNexis Risk Solutions

HPCC Systems for Cyber Security Analytics

Flavio Villanustre, Mauricio Renzi

LexisNexis Risk Solutions

Unsupervised Learning and Image Classification in High Performance Computing ClusterItauma Itauma, Melih S. Aslan, Flavio Villanustre, Xue-wen ChenWayne State University

A convex framework for high-dimensional sparse Cholesky based covariance estimationKshitij Khare, Syed Rahman, Sang Oh, Bala RajaratnamUniversity of Florida
2015

Dynamic Provisioning of Data Intensive Computing Middleware Frameworks: A Case StudyLinh Bao Ngo, Flavio Villanustre, Michael E. Payne, Richard TaylorClemson University

Assessing the effect of high performance computing capabilities on academic research output

Amy W. Apon, Linh B. Ngo, Michael E. Payne, Paul W. Wilson

Clemson University

Deep learning applications and challenges in big data analytics

Maryam M Najafabadi, Flavio Villanustre, Taghi M KhoshgoftaarNaeem SeliyaRandall Wald, Edin Muharemagic

Florida Atlantic University

Industrial big data analytics: lessons from the trenchesFlavio VillanustreLexisNexis Risk Solutions Group

Commercial Big Data Workloads: Lessons from the IndustryFlavio VillanustreLexisNexis Risk Solutions Group
2014

Return on Investment from Academic SupercomputingGreg Newby, Amy Apon, Nick Berente, Rudolph Eigenmann, Susan Fratkin, David Lifka, Craig A. StewartClemson University

Managing the academic data lifecycle: A case study of HPCCMichael E. Payne, Linh B. Ngo, Flavio Villanustre, Amy W. AponClemson University

Using feature selection and classification to build effective and efficient firewallsRandall Wald, Flavio Villanustre, Taghi M. Khoshgoftaar, Richard Zuech, Jarvis Robinson, Edin MuharemagicFlorida Atlantic University

Large-scale entity extraction and probabilistic record linkageFlavio VillanustreLexisNexis Risk Solutions Group

Big data trends and evolution: a human perspectiveFlavio VillanustreLexisNexis Risk Solutions Group
2013

Academic publishing as a social media paradigmMichael E. Payne, Linh B. Ngo, Amy W. AponClemson University

Efficiency as a Measure of Knowledge Production of Research UniversitiesAmy W. Apon, Michael E. Payne, Linh Bao Ngo, Paul W. WilsonClemson University
2011

Handbook of Data Intensive ComputingBorko Furht, Armando EscalanteLexisNexis Risk Solutions Group

Parallel Processing, Multiprocessors and Virtualization in Data-Intensive ComputingJonathan Burger, Richard Chapman, Flavio VillanustreLexisNexis Risk Solutions Group






All pages in this wiki are subject to our site usage guidelines.