Font Size: a A A

Methods,Models And Experiments For Crisis Classification In Arabic Language

Posted on:2021-10-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Adel Ghadah Mohammed AbdullahFull Text:PDF
GTID:1485306050464274Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
In 2010,a wave of anti-government protests took place in Arab countries known as "Arab Spring." It caused a civil war and the worst humanitarian crisis in the world for Yemen and Syria.Since 2016,80%of the population in Yemen are dying from hunger,and 3,886 died from cholera.While since 2011,65%of the Syrian population have become refugees.During these crises,people from both countries turned to social media platforms to convey their crisis-related messages.Accordingly,explosive growth in Twitter users numbers came from Yemen and Syria.Twitter provides them with the ability to communicate,interact,and post-crisis emergency messages as tweets.Tweets helped to describe a variety of crisis-related data such as an update on crisis current status,advice,warning,ask for help,donations,and spread emotions or support.Tweets' contents have led the humanitarian organizations to realize the effectiveness of gathering and analyzing them.This content provides information related to the crisis and assist in enhancing the crisis rescue plan.However,they are facing many difficulties in gathering,annotating,preprocessing,extracting features,and classifying tweets crisis content.These difficulties raised because of the unavailability of corpora,lexicons,or resources in the Arabic language for the humanitarian crisis in Arab countries.They also find it challenging to define the right method for collecting and annotating crisis-related tweets in Arabic.Another challenge is to detect and preprocess the most common Arabic crisis terms that indicate different crisis topics.Moreover,identifying the right features and classification techniques for crisis tweets in Arabic to deploy is questionable.As a result,humanitarian organizations delayed in response to famine,cholera,and refugee crisis and caused a loss of lives and properties.So our study aims to propose methods,models,and experiences to classify crisis-related messages in the Arabic language.For that first,we examine the crisis in Arab countries,platforms,and classification methods used to convey crisis-related messages in Arabic.Next,we define a mechanism to collect Arabic crisis tweets for classification.Then,we determine the annotation criteria to categorize tweets according to the most Arabic crisis terms used.After that,we define the procedures required to preprocess Arabic tweets to produce an Arabic corpus for crisis classification.Afterward,we integrate topic,sentence,and word features to enhance classifier performance.Lastly,we compare our methods and models with different classification models to evaluate their effectiveness.We followed a framework that consists of six phases.Phase one is a composition of a crisis tweets collection by Twitter API and an establishment of the LDA model.In which we generate a list of topics associated with crisis keywords,then use them to enquire with Twitter API again.This collection mechanism retrieves more crisis-related tweets and extended the tweets dataset.In phase two,we created a list of top 10 crisis terms with their similarities,then the list filtered,ranked,and combined for annotation process guidance.This annotation criterion increases quality,reduces bias,and provides guidelines to assign crisis tags according to the dominant crisis terms used in tweets.In the third phase,we excluded a set of Arabic crisis tweets entity names and applied normalization to produce general crisis corpus and avoid written mistakes.After that,we combined two stemmers'properties and guided them to the most crisis tweets morphologies used and reduce infix problems.In the fourth phase,we formed sentence features by averaging word vectors with TF-IDF weighting then merge it with a topic and word features for classification.Finally,we implement SVM,Naive Bayes,and Random Forest for classifications and then evaluate each classifier with accuracy,sensitivity,and specificity metrics.To test our framework,we contacted six experiments.In the first one,we build an LDA model then evaluate it by coherence and perplexity metrics.The model reflects a poor ability and low quality for predicting crisis classification.In the second experiment,we use the TF-IDF model,but the model assignment ability is inaccurate,and the ratio for positive and negative classes is low.In the third,fourth,and fifth experiments,we form CBOW,Skip-gram,and AraVec word embedding,models.The Skip-gram model scored slightly higher than CBOW and AraVec,but the classification is inaccurate,and specificity' rate is much higher than positive classes.In the last experiment,we implement our proposed framework for crisis classification.Outcomes represent enhancements in accuracy and consistency between sensitivity and specificity predictivity ratio for all classifiers.That reflects the effectiveness of our methodologies and models for crisis classification in Arabic.This study provides humanitarian organizations with the first Twitter Arabic corpus,methods,and models for classifying the humanitarian crisis in Arab countries,which will help in improving Arab countries' crisis plans and speed up emergency response time.Also,it considers as a baseline for Arabic crisis-messages classification in social media and opens up prospects for future studies in this field.However,we are planning to extend our corpus with more crisis and countries to be able to apply deep learning techniques for crisis classification.Besides considering colloquial and real-time processing for further studies.
Keywords/Search Tags:Natural Language Processing, Arabic Text Classification, Crisis Management, Twitter, Disaster and Crisis Informatics
PDF Full Text Request
Related items