© 2019 by Mernegar Dorgoly. Proudly created Wix.com

PROJECTS

Sentiment Analysis - Supervised Learning Classification Models

Classified comments collected from the Wikipedia talk page as toxic or non-toxic:

• Created Word2vec and models with Gensim and visualize them with t-SNE

• Implemented feature engineering with TF-IDF and Bag of Words, Word2vec, and

• Implemented traditional machine learning models such as Logistic Regression, Naïve Bayes, Random Forest, and XGBoost and tuned the hyper-parameters for each model via RandomizedSearchCV/ GridSearchCV to create the most optimized individual models 

• Created a pipeline of optimized models with SMOTESynthetic Minority Over-sampling Technique)  in order to achieve better predictors for the imbalanced dataset

• Stacked the optimized models to create the ultimate predictor with the least number of False Negatives. Stacked model improved value by 15%

• Built a deep neural network model and trained it on Word2Vec to classify comments in the correct classes

• Visually compared the performance of different models through Recall values and AUC

Detecting Fraudulent Mobile Money Transactions

Analyzed six million samples of mobile monetary transactions in a month to evaluate the performance of fraud detection methods:

• Managed highly imbalanced dataset for building machine learning models by upsampling the minority class

• Optimized Logistic Regression, Naïve Bayes, Random Forest, and XGBoost by RandomizedSearchCV / GridSearchCV

• Created a pipeline of optimized models with SMOTE(Synthetic Minority Over-sampling Technique) in order to achieve better predictors for the imbalanced dataset

• Stacked top three optimized models by assigning the highest weight to the best model to create the ultimate predictor and decrease the False Negative rates. The final outcome of the stacked model improved the value of Recall up to 92%

• Visually compared the performance of different models through Recall values and AUC

Genetic Variant Classifications

Predicted conflicting classifications for ClinVar variants:

• Explored features in Genetic Variant with Python Matplotlib, Seaborn

•  Standardized numerical features by StandardScalar to improve the performance of machine learning algorithms such as Logistic Regression and SVM

• Mitigated overfitting and tuned hyper-parameters for Stochastic Gradient Descent and Random Forest models via GridSearchCV

• Created a pipeline of optimized Logistic Regression, Stochastic Gradient Descent and AdaBoostClassifier with SMOTE(Synthetic Minority Over-sampling Technique) in order to achieve better predictors for the imbalanced dataset

• Stacked top three optimized models by assigning the highest weight to the best model to create the ultimate predictor and decrease the False Negative rates. The final model improved recall value by 36%

Fatal Police Shooting

Analyzed the fatal shootings of on-duty officers across the country between 2013-2018:

• Trained Random Forest classifier and Naïve Bayes to build a predictive model and predict the likelihood of being armed

• Trained Logistic Regression and Random Forest classifier models to predict the ethnicity of deceased

• Built explanatory Linear Regression and Random Forest Regressor models to predict Median Income and Poverty Rate of the cities based on the rate of shootings