Varsha R Jenni is a 2nd year undergraduate student at the RV College of Engineering.
Varsha has interests in machine learning and distributed computing. She has found HPCC Systems to be a great open source platform which makes data processing analysis easier and faster.
Density-based spatial clustering of data with noise (DBSCAN) is a popular clustering algorithm that groups data points which are close together using two parameters eps - which is the radius of each cluster, and Minpts, which is the minimum number of points in each cluster. However, the performance of DBSCAN reduces for the datasets with varying density clusters. The poster proposes the implementation of a novel distributed and adaptive DBSCAN algorithm on the HPCC Systems platform. The proposed approach uses techniques such as grid search and Gaussian kernel to search optimized values for the threshold density of clusters, thus eliminating the requirement for users to specify the parameters. Further, the experimental investigation suggests that proposed ADBSAN performs better compared to existing ADBSCAN implementations using k-dist and Gaussian kernels.
This study aims at the implementation of an efficient, distributed, and adaptive DBSCAN(ADBSCAN) algorithm HPCC Systems which first determines the threshold density, for any given dataset, including datasets with variable density clusters for clustering. Thus eliminating the need for users to specify the values. Further, the manuscript discusses other ADBSCAN implementations and compares the proposed approach with them using various open datasets.
In this Video Recording, Varsha provides a tour and explanation of her poster content.
Click on the poster for a larger image. The original PDF version can be found here. (Available for download).