Guest Speakers and subjects:
- Farah Al Shanik, Clemson University - Equivalence Terms for Text Search Bundle
Text Search Bundle (TSB) is an open source project for searching on XML text documents & contains many subtasks, one being equivalence terms. We can consider equivalence terms as strong synonyms for TSB. Several term equivalences: initialism, abbreviation, synonyms & similarity based on context.
We used HPCC Systems to develop a Text search tool via Moby thesaurus to return a set of synonyms, word2vec algorithm to return similar words, then built a dataset for
state names & its abbreviation to return the set of related documents while improving the initialism for TSB to find strings with or without the punctuation.
Farah Alshanik is a Ph.D. student of computer science in Clemson University. She received her B.S from Jordan University of Science and Technology. She is working with Dr.Amy Apon as a Research Assistance in Data Intensive Computing Ecosystems (DICE) Lab. Her interest is focused on applying high performance computing to machine learning problems.
- Soukaina Filali, Georgia State University - Fraud Detection on Transactional Data using a Time Series Mining Approach
The project consists of detecting fraudulent pre-paid cards from non-fraudulent ones using mined patterns on their respective historical bank transactions data. There are numerous types of card programs, each of which comes with different fraud risk levels. Every fraud category has representative patterns that a human manually monitors on a daily basis. The goal here is to combine the domain expert engineered features with time series shapelets mining techniques to provide an automated fraud detection solution, which can potentially help in early fraud detection.
Soukaina Filali Boubrahimi is a 3rd year PhD student in the department of computer science at Georgia State University and research assistant member of the data mining lab at GSU under the supervision of Dr. Rafal Angryk. In this talk, Soukaina will talk about the fraud detection project using time series data mining approach she worked on during her internship. Her research interest is on time series data mining (classification, clustering...). She has publications on the topic in IEEE Big Data, IEEE DSAA, DEXA and the applied APJs journal. Soukaina received a Master in Science in Software Engineering and a Bachelor in Computer science from Al Akhawayn University(AUI) in Ifrane, Morocco.
- Lily Xu, Clemson University & Gus Reyna, LexisNexis Risk Solutions - Using HPCC Systems ML to Map Thousands of Violation Descriptions to Standard Violation Codes
There is a challenge of incorporating public records data into business processes given disparate descriptions across states for similar events, and then finding a standard that gives one consistent meaning for use. This session tells the story of how the HPCC Systems Machine Learning addressed the problem of mapping thousands of disparate public record data descriptions to a corresponding set of standard codes and the future direction for this approach.
Lili Xu is a PhD candidate from DICE lab directed by Dr. Apon in the school of computing of Clemson University. It’s her third time interning in HPCC Systems team working on machine learning applications. Her research area is machine learning, natural language processing and high performance computing. She can speak only three language but she can program more than three languages.
Gus Reyna is a Director with LexisNexis Risk Solutions where he leads the engineering team for the Motor Vehicle Report (MVR) data products. He has been working at LexisNexis for 9 years building data solutions on the HPCC Systems platform.