2+ years of experience in developing creative and critical solutions that used large datasets. Has 5+ years of international and remote collaborative experience. Posses Bachelor's in Computer Science from Asia Pacific University, Malayisa.
Currently, garnaring diverse experience in Data Science and Machine Learning through independent projects.
Find code at this Github Repository
Support Vector Machine (SVM) Deep Learning Random Forest Classifier Data Cleaning Feature Engineering Scikit Learn Tensorflow Keras
This project shows Shahiryar’s commitment, creativity, and orginallity in ideation and implmentation of a project that addresses a pressing real-world-problem.
The objective of this project is to develop a predictive model to determine the likelihood of conflicts, civil unrest, or political violence in a country. By analyzing historical data and various socio-economic, political, and demographic factors, the model aims to provide early warning indicators and insights that can help policymakers and organizations proactively address potential conflicts.
The cost associated with the conflicts are substantial. Therefore, it is important to use creative ways to look at conflicts in order to not just understand it but to model it, address the root causes, and control conflicts even before they become violent. My country, has been under constant pressure of political conflicts and violence. This model is an attempt to model conflict generally and uncover the underlying socio-economic factors.
Random Forest Classifier gave the best results on three evaluation metrics namely: Accuracy Score, Recall Score, and F1 Score. The confusion matrix below shows predictions on test dataset:
Three types of machine learning algorithms were tried viz
sklearn.svm.SVC
sklearn.ensemle.RandomForestClassifier
tensorflow.keras.Sequential
SVM gave an accuracy close to 85% however the recall
score was dimally low: around 50%
Meanwhile, Deep Learning Sequential Model could not produce a good recall score as the confusion matrix showed a low True positive (TP).
Although, the sequential model was good at classifying a Negative (high True Negative (TN)). This could be because of distribution of dataset which has more datapoints with Negative classes. Improving the distribution could alliviate this issue.
Finally, the model with best accuracy and recall is the Random Forest Model. The evaluation metrics are shown below.
from sklearn.metrics import accuracy_score, recall_score, f1_score
print("For Base Random Forest Classification model")
print("Accuracy score: ",accuracy_score(y_test, y_pred))
print("Recall : ",recall_score(y_test, y_pred))
print("f1 score : ", f1_score(y_test, y_pred))
For Base Random Forest Classification model
Accuracy score: 0.9579776756401839
Recall : 0.7954545454545454
f1 score : 0.8677685950413222
Features required are as follows:
Socio-Economic Indicator
- Year
- GDP
- Gini Coef
- Literacy Rate
- health indicators
- infrstructure development
- employment level
- percent of labor force by total population
- dependency ratio
- income level
- per captia income
- umemployment rate
Political Idicators
- World Wide Governance Indicator
- Democracy Index
- poltical Instability Task Force
Demographic
- Ethnic/Religion make up
- population density
- urbanisation rate
- size of middle class
- migration patterns
- age structure (middle quantile)
Environmental
- per captia water availabilty
- deforestation rate
- Climate change impact
- natural disaster
Conflict Related
- Type of Conflict/unrest
- Casualities
- number of conflicts (in the year under review)