Font Size: a A A

Detecting Abusive Arabic Language Twitter Accounts Using a Multidimensional Analysis Mode

Posted on:2018-09-12Degree:Ph.DType:Dissertation
University:George Mason UniversityCandidate:Abozinadah, EhabFull Text:PDF
GTID:1445390005958250Subject:Information Technology
Abstract/Summary:PDF Full Text Request
Twitter is one of the most popular social media sources for disseminating news and propaganda in the Middle East. The increased use of social media has motivated spammers to post malicious content on social media sites. Some of these Arabic language spammers use adult content to further the distribution of their malicious activities. However, the extensive number of users posting adult content in social media degrades the experience for other users for whom the adult content is not desired or appropriate. These accounts would be suspended or terminated from Twitter whenever reported by Twitter's users as Twitter prohibits adult content in an image, a video, or a text. Moreover, some countries have attempted to detect these accounts, but have failed as these accounts use informal Arabic language and misspelled words that cannot be detected using blacklisted keywords.;In this research, I built a model to detect abusive Arabic language Twitter accounts that use obscenity, profanity, or inappropriate words in tweet content. The model is based on a multi-dimensional analysis approach by using independent lexical analysis, social graph analysis, and statistical analysis. Independent lexical analysis approaches are used to overcome the limitation of Arabic language analysis tools for correcting the misspelled words in the tweet, finding the abusive and non-abusive related words, and finding the concept related to the word. Social graph analysis is used to identify the user connectivity relationships on Twitter. Statistical analysis is used to identify the user's tweeting characteristics.;My analysis was based on real data collected from Twitter. The data was manually labeled to support a supervised machine learning technique (Support Vector Machine (SVM)). The constructed model contains 31 distinct features that are formed from profile information, social graph centrality measures, tweet elements' counts, and tweet lexical analysis measures. The model was evaluated against a previously unseen subset of the collected data and achieved 90% average accuracy.
Keywords/Search Tags:Twitter, Arabic language, Social media, Accounts, Adult content, Abusive, Using, Model
PDF Full Text Request
Related items