Font Size: a A A

Research On Spam Detection Method In Social Network

Posted on:2018-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2417330518981992Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Now,the social network has become one of the main ways for people to communicate with each other on the Internet.Users spend a lot of time on a number of popular social networking sites such as Facebook,twitter,Sina,micro-blog.They also storage and share a lot of private information.However,in people sharing,exchange and interaction,at the same time,spam has been expanded.Although the major social networking sites themselves have raised the importance of garbage detection research,but this problem has not been well resolved.Therefore,the spam filtering technology in social networks has become a hot issue for researchers.Therefore,the spam filtering technology has become a hot issue among researchers.The main research contents of this paper are as follows:first detailed introduction of current social network garbage,then further elaborates the characteristics and harms of social network cheating,the method of garbage detection from different dimensions,such as based on statistical methods,based on rules and pattern recognition method.The classification algorithms(naive Bayes,support vector machines,random forests,etc.)are also introduced in detail.Then combined with the characteristics of most social networking sites,access to the original features,and then through the analysis of user characteristics,user behavior,content features,the processing of the original features,effective new features,and to quantify the input of the classifier.In this article,we mainly take the most popular social networking site Twitter as an example to study the garbage detection mechanism,introduced in detail the research on twitter spam detection,based on the characteristics of the user information and the content of the published information,the garbage detection mechanism is proposed.In order to collect experimental data,we crawling 25469 accounts,about 500k tweets by use Twitter official open API.Taking into account the limitations of the experimental conditions,randomly selected 1000 accounts,artificially divided into junk accounts and normal accounts.Based on the analysis of the experimental data set,the paper compares the difference between the junk account and the normal account.Then the traditional classifier algorithm(random forest,naive Bayes,support vector machine,K nearest neighbor method)is used to classify the experimental data sets.The experimental results show that the random forest classifier has higher performance.
Keywords/Search Tags:social network security, spam detection, machine learning, classification
PDF Full Text Request
Related items