Font Size: a A A

Research On Online Public Opinion Based On Topic Modeling And Sentiment Analysis

Posted on:2024-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ZhangFull Text:PDF
GTID:2557307052493564Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,various social networking platforms have sprung up,where users can express their views and opinions based on the hot events of the day,and online public opinion is thus generated.Sina Weibo,as a platform for instant interaction,has created a relaxed environment for the formation and dissemination of social opinion,and has become an important position for public opinion guidance.Azera’s eagerness to clear the air of this unfortunate incident caused discontent among netizens,and the controversy over the incident began.Translated with www.Deep L.com/Translator(free version)In this paper,we crawl the comments on Sina Weibo to investigate two aspects: LDA topic modeling and text sentiment analysis.Firstly,the text data are pre-processed with regularization,manual tagging,word separation and deactivation,and a custom word separation dictionary and deactivation dictionary are added for this experimental data.For topic modeling,topic modeling is performed using the Dirichlet distribution(LDA)to determine the number of topics by calculating the perplexity,outputting the keywords and the weight of each keyword for each topic,and using py LDAvis to visualize the LDA model and extract the information in the model for interactive web-based visualization.For the sentiment analysis of the text,the analysis is performed from two approaches based on lexicon and machine learning.In the sentiment lexicon approach,TF-IDF is used to find document keywords,from which positive and negative sentiment seed words are selected,and the domain sentiment lexicon is constructed by SO-PMI point mutual information algorithm,which is combined with the basic sentiment lexicon,degree adverb lexicon and negation words to get the sentiment score of each comment according to the scoring rules,and the accuracy of the sentiment lexicon to judge the sentiment tendency is tested according to the labeled comments.Based on machine learning sentiment classification,the text content is transformed into bag-of-words vectors,the tf-idf weight of each word is calculated,a matrix is generated,and the word vectors are used as features to train the classification model using logistic Regression,Naive Bayes,and Support Vector Machine(SVM),which are common classification algorithms for text analysis,and a grid search cross Validation method is used to find the optimal hyperparameters.The results show that the comments on this incident were classified into four major themes by the LDA topic model: "NIO’s official response highlights the coldbloodedness of capital","shirking responsibility for the decline in sales","lamenting the death of the test driver and hoping for the truth about the accident","lamenting the death of the test driver",and "hoping for the truth about the accident".The LDA theme model is divided into four major themes: "NIO’s official response highlights the coldbloodedness of capital","Worried about the decline in sales and shirking responsibility","Lamenting the death of the test driver and hoping for the truth about the accident",and "Questioning the safety of the test site".The accuracy of sentiment judgment based on the domain sentiment dictionary method is 96.47%,and the check-all rate is 77.21%.In the sentiment classification based on machine learning,the classification model built using the plain Bayesian algorithm has each evaluation index above 0.8,and the accuracy of sentiment tendency prediction is further improved compared with the sentiment analysis based on sentiment dictionary.
Keywords/Search Tags:Opinion Analysis, LDA Topic Modeling, Domain Sentiment Dictionary, Machine Learning
PDF Full Text Request
Related items