Font Size: a A A

Research And Implementation Of Microblogging Bot Detection Technology

Posted on:2015-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:M QuFull Text:PDF
GTID:2348330509460760Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Due to the rapid development and popularization of the Internet in the past few years, microblogging, with its openness, convenience and other features, has become an important type of social media closely related with people’s daily lives. However, while people are following news stories and sharing knowledge on the microblogging social networks, the number of microblogging bots, at the same time, is also expanding. The bots have disrupted the normal order of the microblogging platform, triggering a crisis of confidence. Therefore, microblogging bot detection technology has become a hotspot in current research.Research on microblogging bot detection technology also faces enormous challenges. Microblogging platforms, with its unique law of information dissemination, relatively large noise data and small sampling characteristics, have great differences in analysis methods with traditional social networks; on the other hand, microblogging bots have been increasing its covertness through continuous evolution, becoming behaviorally more and more close to real microblogging users.Data classification technology based on machine learning is currently widely used in social network analysis to make predictions with low cost and high accuracy rate. As traditional bot detection methods on the emerging microblogging platform are getting more and more ineffective, microblogging bot detection technology based on machine learning is the current trend of academic research. In this thesis, a microblogging bot detection strategy based on incremental machine learning is presented on the basis of related work; a microblogging bot detection system oriented to both Twitter and Sina Weibo is designed and implemented. In detail, the main contents and contributions of this thesis are hightlighted as follows.1) From the characteristics of microblogging social network, information dissemination in microblogging platforms is analyzed in this thesis. Traditional bot detection methods are studied, as well as microblogging bot detection methods based on machine learning, including Naive Bayesian analysis, Logistic Regression and Support Vector Machine.2) For the characteristics of a microblogging user, the features of a microblogging bot are observed from both microblogging content perspective and user behavior perspective. Formal calculations of these features are present in this thesis for machine learning classification of microblogging users. Moreover, some specific user features are analyzed and evaluated with experiments conducted on a real dataset.3) To extract the similarity feature of microblogging user statuses, Chinese short text clustering algorithms are studied in this thesis. Furthermore, a cluttering algorithm on microblogging abnormal short texts is proposed for the variety of microblogging bot statuses, novelly discussing microblogging text normalization methods and generating similarity feature from the clustered content for the subsequent process of machine learning classification.4) From the perspective of the practical application of the sytem, incremental machine learning algorithms are studied in order to better adapt to changes in the model. In this work, an improved version of incremental support vector machine algorithm, as well as a practical microblogging bot detection framework based on incremental machine learning is proposed. Moreover, a microblogging bot detection system oriented to both Twitter and Sina Weibo is designed and implemented.
Keywords/Search Tags:Microblogging, Bot Detection, Incremental Machine Learning, Short Text Clustering, Support Vector Machine
PDF Full Text Request
Related items