Detecting Abusive Arabic Language Twitter Accounts Using a Multidimensional Analysis Mode

Posted on:2018-09-12

Degree:Ph.D

Type:Dissertation

University:George Mason University

Candidate:Abozinadah, Ehab

Full Text:PDF

GTID:1445390005958250

Subject:Information Technology

Abstract/Summary:

PDF Full Text Request

Twitter is one of the most popular social media sources for disseminating news and propaganda in the Middle East. The increased use of social media has motivated spammers to post malicious content on social media sites. Some of these Arabic language spammers use adult content to further the distribution of their malicious activities. However, the extensive number of users posting adult content in social media degrades the experience for other users for whom the adult content is not desired or appropriate. These accounts would be suspended or terminated from Twitter whenever reported by Twitter's users as Twitter prohibits adult content in an image, a video, or a text. Moreover, some countries have attempted to detect these accounts, but have failed as these accounts use informal Arabic language and misspelled words that cannot be detected using blacklisted keywords.;In this research, I built a model to detect abusive Arabic language Twitter accounts that use obscenity, profanity, or inappropriate words in tweet content. The model is based on a multi-dimensional analysis approach by using independent lexical analysis, social graph analysis, and statistical analysis. Independent lexical analysis approaches are used to overcome the limitation of Arabic language analysis tools for correcting the misspelled words in the tweet, finding the abusive and non-abusive related words, and finding the concept related to the word. Social graph analysis is used to identify the user connectivity relationships on Twitter. Statistical analysis is used to identify the user's tweeting characteristics.;My analysis was based on real data collected from Twitter. The data was manually labeled to support a supervised machine learning technique (Support Vector Machine (SVM)). The constructed model contains 31 distinct features that are formed from profile information, social graph centrality measures, tweet elements' counts, and tweet lexical analysis measures. The model was evaluated against a previously unseen subset of the collected data and achieved 90% average accuracy.

Keywords/Search Tags:

Twitter, Arabic language, Social media, Accounts, Adult content, Abusive, Using, Model

PDF Full Text Request

Related items

1	City Branding On Global Social Media:A Multimodal Discourse Analysis Of Tweets From Official Olympic Twitter Accounts@Beijing2022and@Tokyo2020
2	Multilingual use of Twitter: Language choice and language bridges in a social network
3	Study On Chinese And Vietnamese Abusive Language
4	Methods,Models And Experiments For Crisis Classification In Arabic Language
5	Influence Mechanism Of Abusive Management On Employee Innovation Behavior: A Moderated Chain Mediation Model
6	The effect of dynamic assessment on adult learners of Arabic: A mixed-method study at the Defense Language Institute Foreign Language Center
7	Discovering and examining Arabic young adult literature trade books: A content analysis of cultural authenticity
8	Building Dynamic Ontological Models for Place Names Using Social Media Data from Twitter and Sina Weib
9	New Chinese Abusive Language In The Mainstream Media
10	A Report On E-C Translation Of Ten Arguments For Deleting Your Social Media Accounts Right Now (Excerpts)