Font Size: a A A

Research On The Evolution And Early Warning Model Of Internet Public Opinion

Posted on:2024-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:X H TangFull Text:PDF
GTID:2568307103473314Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Most of the traditional topic evolution models are based on the latent Dirichlet allocation(LDA)model for topic mining of text content,but the number of clusters of LDA needs to be given in advance,which is relatively subjective.Hierarchical Dirichlet process(HDP)model can not only cluster and infer topics,but also automatically generate the number of clusters based on data,which greatly enhances the robustness of the algorithm.This paper analyzes topic evolution based on HDP model.The first part is to evaluate the effectiveness and rationality of the topic clustering performance of the HDP model.First,python crawlers were used to obtain about 350 thousand microblog texts related to "COVID-19 Vaccination" in 2021,which were divided into twelve text subsets in chronological order.After data cleaning,about 200 thousand text data remained,and about seven thousand text data were manually marked.Select appropriate clustering evaluation indicators to evaluate the clustering effect of HDP model,and compare it with LDA model.It is found that the overall performance of HDP model is better than that of LDA in F value,normalized mutual information and adjusted Rand index.Then,according to the topic distribution obtained by the model,compare the topic with the marked data set,and find that the topic found by HDP is more consistent with the marked topic than that found by LDA,and the topic quality is better,and the topic words are clearer.Finally,comparing the number of topics of HDP and LDA on twelve text subsets,it is found that the optimal number of topics of HDP is generally slightly higher than that of LDA.This is because HDP has the characteristics of adaptive clustering.When clustering,the sample data has more clustering space to choose from.Compared with HDP model,HDP model has better flexibility and can find more subtopics.Therefore,HDP model is effective for topic clustering of microblog short text,and is superior to LDA.The second part is an experimental analysis combining HDP model with topic evolution analysis.First,extract the top ten main topic information of each text subset from the twelve text subsets after cleaning and use HDP model respectively,and summarize and analyze the relationship between each topic and the actual topic.Secondly,the topic similarity is calculated based on the cosine similarity to explore the topic evolution relationship between the adjacent time text subsets,and construct the topic evolution map,briefly analyzing the continuation and evolution of topics in the evolution process.Then,this paper summarized the topics generated by HDP model clustering,which were mainly divided into four categories: "COVID-19epidemic","epidemic prevention and control","COVID-19 vaccine" and "other topics".Finally,this paper combined related topics and studied the changes of content and intensity of some topics over time.
Keywords/Search Tags:hierarchical Dirichlet process model, topic evolution, latent Dirichlet allocation model, COVID-19 vaccine
PDF Full Text Request
Related items