Page tree
Skip to end of metadata
Go to start of metadata

Browse Poster Wiki: 2022 Poster Award Winners, Awards Ceremony (Watch Recording from minute marker 1630), Posters by 2022 HPCC Systems InternsPosters by Academic Partners, Poster Judges, About Virtual Judging, 2022 Poster Contest Home Page, Poster Contest Previous Years

Arun Gaonkar is a Masters student studying Computer Science at North Carolina State University.

Arun joined the 2022 HPCC Systems Intern Program to test how our new Causality Toolkit performs on real world datasets with a view to verifying that the results match expectations. The Causality Project was new in 2021. The leader of this project, Roger Dev has written the following blogs to support work on the project and provide information for those who would like to try out out new Causality Toolkit:

As well as the resources included here, read Arun's intern blog journal which includes a more in depth look of his work. 

Poster Abstract

Far from philosophical and theoretical discussions, causality and its application to real-world scenarios are attracting a lot of attention nowadays. Pioneering research in the field of causal analysis has evolved into a field of its own. The current state of the art in this field is the use of statistical methods to analyze the causation of events.

This project is focused on the analysis of causality and causation-based inference. The main aim of the research is to understand the causal relationship between the factors that are involved in the real-world dataset by applying the Causality Toolkit developed by HPCC systems.

From the CDC dataset that included details from the health survey, I have analyzed diabetes and the effect of a few variables on the probability of diabetes. Using the Because module developed by HPCC Systems, I have observed patterns and inferenced the causation of diabetes. One hypothesis is that the Body Mass Index (BMI) , which is caused by Height and Weight of a person, in turn affects Diabetes. Age affecting height and weight is common but causing income is an interesting inference. Diabetes also causes the income of a person.

In conclusion, from the application of the Because module to the real datasets, we can observe and analyze the cause and effect of each variable of data.


In this Video Recording, Arun provides a tour and explanation of his poster content.

Applying Causality Toolkit to Real World Dataset

Click on the poster for a larger image. 

  • No labels