Abrar Fahim

Data Scientist | Former KTP Associate


Highlights

- Data Scientist with experience in the HealthCare and FinTech industries.
- Endorsed by UK Research and Innovation for significant research contributions and obtained prestigious Global Talent Visa
- Specialized in employing Artificial Intelligence for emotion identification. Developed FinLex, a Financial keyword dictionary for emotion identification that outperforms other generic emotion dictionaries
- Working with Large Language Models, leveraging their capabilities to enhance machines’ natural language understanding.
- Successfully founded project verbix, demonstrating the applications of LLM with various problem-solving capabilities.
Education
Msc in Data Science and Analytics
Bsc in Computer Science and Engineering
Email
abrar-personal@outlook.com
Location
United Kingdom

Professional Skills

Data Science
Natural Language Processing
Large Language Model
Python
Tensorflow
Statistics

Work Experience

Founder, Project Verbix
Sept, 2023 - Present
- Fine Tuned flan-t5 Large Language Model (LLM) for legal document summarization, and question answering.
- Created a movie search engine based on semantic search, enabling users to receive precise movie suggestions based on narrative input.
- Developed a ”Talk to Your PDF” tool using vector databases and LLM to enable natural language interaction with PDF documents.
- Utilized google’s gemini-pro LLM model to develop a CV analyzer, facilitating skills match analysis, interview question generation, and personalized cover letter based on CV and job posting input.
Jan, 2023 - Present
- Leveraged topic modeling to extract meaningful themes from healthcare papers.
- Constructed own metrics and custom visualization tools so that different technique efficiency can be measured and visualized in the same scale and space.
- Created a fact induction tool that can search for fact-like sentences in documents, useful for monitoring fact-updates in the healthcare industry.
- Developed a tool to identify writing patterns in academic papers, aiming to enhance the understanding of the healthcare industry.
Dec, 2021 - Jan, 2023
Emotional AI for Trading on Financial Market
Knowledge Transfer Partnership between Brunel University London and Advanced Logic Analytics
- Led the development of a stock market behavior observation tool by analyzing seven millions of financial-news articles.
- Significantly improved correlations between stock prices and emotions through the development of an observation tool.
- Used GNN for finding unidentified market behavior from financial data
- Proved mathematically that different companies have different levels of tolerance against negative news and established a threshold of tolerance.
June, 2021 - Sept, 2021
- Constructed an Image captioning model that can generate captions for disabled people.
- Used transfer learning to extract the features from the images and pre-trained embeddings are used for text data preparations.
- Some captions generated by the best model are shown below,

Research and Projects

Artificial Neural Network and Machine Learning Based Methods for Population Estimation of Rohingya Refugees Comparing Data-Driven and Satellite Image-Driven Approaches
2019
- The data provided by NGOs, and satellite data from the google earth engine
- Compared between Machine learning and Neural Net based methods for predicting population
- Published the work in the form of an article in a renowned journal.

Find the full article here
Airplane Demand-Drop During Covid , using crowdsourced data
2021
Applied Mapreduce, Python Dask, Piglatin for batch processing (1.6M instances) | Found the relationship of passenger movement and spread of Covid | Visualized the final output with Tableau and PowerBI

Find the full project here
Face Landmark Detection , using pretrained deep learning model
2021

Detecting 468 points on peoples face using mediapipe pretrained model

Find the full project here
Automatic Number Plate Recognition System for Bengali Style NumberPlate industry-academia collaboration project
2019

- Extracted the number plate from the car image using the YOLO algorithm.
- Segmented each character and fed into CNN for classification.
- Developed a webpage using Django.
- The model can Recognize Number Plates with ~80 percent accuracy
Visualizing A-level School Data , using the UK government data
2021

- Collected data from the UK government’s website
- Applied DBSCAN clustering to see similar schools.
- Analyzed the diversity of schools in terms of language and locations.
- Figured out the popular subject choice among males and females students.
- Visualized the data with Tableau and PowerBI.

Find the full project here
Predicting Rating from Food Recipe data, using big data
2021
- Analyzed sentiment from customer reviews to understand the customer’s choice.
- Applied PCA for feature extraction, and different regression algorithms for predictions
- MSE was 0.27

Find the full project here
Master Dataset for Understanding Pandemic , using heterogeneous big data
2020
- Collected and merge data from different sources (e.g. Microsoft Bing Covid data, Airplane movement data, Word Bank data)
- Cleaned and merged data based on a primary key (date)
- Created a master dataset for further analysis.

Find the full project here
Web Scrapping of Restaurant data , freelance project
2019
- Scrapped Restaurant name, phone number, open/close hours from several websites
- Used proxy servers, random times for each Scarpe to avoid being blocked by the websites
- Developed a webpage along with “search by region” properties using Flask.
Spam Classification Using Machine Learning and Deep Learning, using deep learning
2021
There are several spam filters for email services, but a cellphone can be get hacked by a small short message. This project is all about identifying spam in short cell phone messages. Used Machine Learning and Deep Learning based architectures for developing the model and compared between them.
- Cleaned and Tokenized the data
- Used TFIDF vectorization and Word Embeddings techniques
- Used both Machine Learning and Deep learning based methods for classification
- The highest accuracy obtained is 98%

Find the full project here
Bar race for Corona Virus Propagation, using flourish app
2020
- Clean and convert the data in a way so that it can be directly fed into flourish app.

Find the full project here
Java Game , undergrad course project
2015

Java game for Undergraduate Java course. A game where a running player has to ignore eating junk food found on the running path. The more avoiding junk food, the more score will be added.

Find the full project here


Contact

Email: abrar-personal@outlook.com
Connect on LinkedIn