Browse Poster Wiki: 2022 Poster Award Winners, Awards Ceremony (Watch Recording from minute marker 1630), Posters by 2022 HPCC Systems Interns, Posters by Academic Partners, Poster Judges, About Virtual Judging, 2022 Poster Contest Home Page, Poster Contest Previous Years
Sarthak Sharan is a student at RV College of Engineering in Bengaluru, India with a keen interest in Data Science, Machine Learning and Blockchain Technology as well as their applications. He is actively working with the LexisNexis® Risk Solutions HPCC Systems® team and the RV College of Engineering Centre of Excellence on Cognitive Intelligent Systems for Sustainable Solutions to investigate the bitcoin blockchain data to gain insights on the nature of transactions and build relationship between the transactions which can help in identifying potential criminal transactions. He has also worked on a project based on Cosmic Ray Segmentation using Sharp U-Net on the Hubble Space Telescope (HST) data. Sarthak is pursuing a Bachelor of Computer Science and Engineering at RV College of Engineering.
In a world where the financial market becomes more and more digital day by day with new cryptocurrencies rising left right and center, Bitcoin still remains the most popular cryptocurrency. While the technology prevents fraud on the network, there are no checks to track how the bitcoins are being used and for what purpose. We investigate the block data stored on the Bitcoin blockchain to gain insight and build relationships between transactions that can shed light on the transactions. By building rich relationships between the transactions to detect the anomalies in the blockchain network, we hope that it can be used by the investigators to detect criminal activity. Even though Blockchain technology prevents fraudulent behavior, it cannot detect fraud on its own. Therefore, it is important to use anomaly detection for identifying potential scams.
The first task is to collect the entire Blockchain transaction data. This is achieved with the help of Python and ECL. The python code would parse the raw hex data stored by the bitcoin core node and convert the raw hexadecimal block data into human interpretable data. However, bitcoin does not store the input address but instead stores the previous input transaction hash and its corresponding output that acts as an input to the current transaction. To find the input address we locate the previous incoming transaction and match the output address. In order to process the data on the HPCC Systems, we use ECL.
Next, instead of detecting the anomaly of individual addresses and wallets, we examined the anomaly of users. Since users carrying out illicit activities mainly use multiple wallet addresses, it is more efficient to choose a method that can examine the user’s behavior instead of the wallet address. This has been achieved by using the trimmed K-means algorithm for clustering.
The trimmed K-means is based on partial trimming that is more robust than classical K-means clustering. In order to carry out this approach we decided on using ten specialized features based on our parsed dataset. These features were extracted by executing ECL programs and then using their outputs as the input to our trimmed K-Means algorithm.
This approach detects anomalies of potentially suspicious users having multiple wallet addresses by dividing addresses into multiple clusters. It can be inferred that people who commit fraud and malicious activities in the Bitcoin network use several addresses to normalize their activities as normal users. These users’ activities with multiple addresses makes them look almost like normal users
In this Video Recording, Sarthak provides a tour and explanation of his poster content.
Using Anomaly detection to detect fraudulent behavior on the Bitcoin network
Click on the poster for a larger image.