Browse Poster Wiki: 2022 Poster Award Winners, Awards Ceremony (Watch Recording from minute marker 1630), Posters by 2022 HPCC Systems Interns, Posters by Academic Partners, Poster Judges, About Virtual Judging, 2022 Poster Contest Home Page, Poster Contest Previous Years
Sriram Praveen is a student at RV College of Engineering in Bengaluru, India with a keen interest in Deep Learning, Image Processing, Machine Learning and Blockchain Technology as well as their applications. He is actively working with the LexisNexis® Risk Solutions HPCC Systems® team and the RV College of Engineering Centre of Excellence on Cognitive Intelligent Systems for Sustainable Solutions to investigate the bitcoin blockchain data to gain insights on the nature of transactions which can help in identifying potential criminal transactions. He has worked with SCII to create an invoice data extraction system. He has also worked with the Centre of CCTV Research, RVCE to develop a system for Person Re-Identification using Deep Learning. Sriram is a 3rd year student pursuing a degree in Bachelors of Computer Science and Engineering at RV College of Engineering.
With almost $14.08B going through illegal crypto addresses there is a need to investigate and build rich relationships between bitcoin transactions so as to deanonymize users and track parties who use the network for illegal purposes such as racketeering, trafficking, and money laundering. The challenge faced is that bitcoin transactions usually do not carry adequate information about the user in the blockchain. To overcome this, the behavior of users over long periods of time can be used to great effect, almost acting like a virtual fingerprint of the person. The aim of this project is to use the transaction time series data generated by a given address and compare it to a pre-existing anomaly sample and classify it. The end goal is to detect users who are either using a different set of addresses for personal uses but with similar patterns of behavior or to find different users with a comparable pattern of behavior.
The first task involved the collection of the bitcoin transaction data. To achieve this Python and ECL were employed. The python code would parse the raw hex data stored by a bitcoin core node in the form of blk00xxx.dat files. Each file contains multiple blocks of data and the file is generally limited to 128 MiB. This data is parsed into a structured form.
The Python code is used to convert the raw hexadecimal block data into human interpretable data. However, bitcoin does not store the input address but instead stores the input transaction and its corresponding output that acts as an input to the current transaction. This confusing pair of data is not required. To find the input address we locate the previous incoming transaction and match the output address. The processing is done using HPCC Systems in ECL due to its superior indexing efficiency.
After this, the Kolmogorov–Smirnov test which is a nonparametric test of the equality of continuous (or discontinuous) one-dimensional probability distributions is performed. The two-sample KS test, and other such tests, allow us to compare any two given samples and check whether they came from the same distribution. If so, we can conclude with a high degree of certainty that the two are originating either from the same user or from people engaging in similar activities in this case - criminal activities.
The dataset for fraudulent addresses to test against is obtained from bitcoinabuse.com which is a publicly available portal to check for community reported addresses. The dataset is cleaned and the tests are run to find the fraudulent address from its transaction time series.
In this Video Recording, V A Sriram provides a tour and explanation of his poster content.
Identifying fraudulent Bitcoin users by transaction behavior
Click on the poster for a larger image.