Font Size: a A A

Research On The Recognition Of High-Forwarding Microblog Rumor Based On Deep Learning

Posted on:2020-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y X CaiFull Text:PDF
GTID:2416330596981768Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
Sina Weibo is China's largest Weibo service platform,and the spread of Weibo rumors will cause social impact.According to relevant laws and regulations,forwarding Weibo rumors more than 500 times can be condemned as libel to user,who often make forwarding behaviors without knowing the content is high forwarding rumor.This paper mines the patterns of high-forward rumors from the microblog text features and users'features,and can remind users in advance to maintain a healthy network environment.This paper divides Weibo rumors into high-forward rumors and low-forward rumors according to whether the forwarding amount exceeds 500 times,and uses the deep learning and machine learning methods to classify.The overall process of this paper is as follows:training word-embedding vector;establishing TextCNN and LSTM models to identify high-forward rumors;introducing user features,and then establishing Concat-TextCNN,Concat-LSTM,TextCNN-GBDT,LSTM-GBDT,Wide&TextCNN and Wide&LSTM model to improve recognition rate.The details of our work include:First,training the word vector,training TextCNN and LSTM using only text features as the baseline of this article.Then,the first method of adding user features to improve the recognition rate is proposed Concat-TextCNN and Concat-LSTM model based on feature concatenating.Based on this,we use the middle layer output of Concat-TextCNN and Concat-LSTM as the new feature,to build TextCNN-GBDT,LSTM-GBDT classifiers.Finally,the second method of adding user features to improve the recognition rate is proposed.With reference to the Wide&Deep model idea,the Wide&TextCNN and Wide&LSTM models are established.The cross-featureWe do our experiments on the Sina Weibo rumor public data set.we analyze word frequency,main type of high-forward rumors,and compare statistics of length between high-forward rumors and low-forward rumors.Here comes the conclusions:Sina Weibo users are most concerned about national events;the main type of high-forward rumors is social life rumor;the amount of rumor forwarding is basically independent of the length of rumors.Training word-embedding vector,experiment shows that Word2vec continues to train with the task is the most suitable word vector mode for this task.Through the Sina Weibo API,five user characteristics improved classification models were selected:the number of fans,the number of followers,the number of mutual follow,the Sina Weibo credit score,and the number of weibo statuses.TextCNN and LSTM can achieve F1 score of 79%and 89%,respectively;Concat-TextCNN,Concat-LSTM achieve F1 score of 91%and 90%,respectively;TextCNN-GBDT and LSTM-GBDT reach F1 score of 94.7%and 94.2%,respectively.Wide&TextCNN and Wide&LSTM achieve F1 score of 89%and 92%,respectively.The contribution and innovation of this paper lies in the use of deep learning techniques such as TextCNN,TextCNN,and Wide&Deep models,adding user features,identifying high-forwarding Sina Weibo rumors,and achieving good results,and proposing guidance on microblog user behavior.Under the conclusion of this paper,Weibo users should be extra cautious when forwarding Weibo of national events,social life events,stars and other VIP accounts,to avoid the legal consequences of subsequent large-scale forwarding.
Keywords/Search Tags:Sina Weibo, Rumor Forward, Text Classification, Deep Learning, Multi-source Feature
PDF Full Text Request
Related items