Font Size: a A A

Research And Implement On Hot Topic Discovery Technology Of Micro-Blogging Network

Posted on:2019-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2348330542491049Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Recently years,microblogging has accumulated a large number of users rapidly due to its features and good user experience.Microblogging platforms tend to have higher timeliness and deeper user engagement than other media in the spread of social topics.A hot topic is often released in the microblogging platform firstly and attracted a lot of attention in a short time.In fact,hot topics are so important for journalism、finance and National public opinion security,that to extract hot topics in microblogging platform is of great significance.How to extract the hot topics in microblogging platform quickly and efficiently is the main content of this thesis.This thesis starts with the existing researches,and sums up the research results of predecessors.At the same time,the thesis introduced the theoretical basis of how to extract the hot topics in microblogging platform systematically.In view of the advantages and disadvantages of the clustering algorithm and the topic model algorithm,this thesis proposes a hot topic extraction scheme based on the LDA topic model algorithm.The results are as follows:(1)Aiming at the problem that LDA topic model is not effective in micro-blog short text domain,this thesis proposes a micro-blog short message expansion scheme based on micro-blog comments and Baidu encyclopedia entries.Considering the characteristics of micro-blog text,we design a mechanism to filter microblogging comments based on word co-occurrence model and Baidu encyclopedia entries based on word coincidence probability.The experimental results show that the average length of the expanded microblogging text has increased by nearly 50%and no matter what the topic number is,the Perplexity of LDA is lower.(2)A hot topic extraction scheme for micro-blog is proposed based on time series segmentation and topic clustering by using the output of LDA topic model According to the life cycle theory of information,this thesis proposed a method to divide the extended microblogging by the time when it is released and to form unit corpuses.We use the LDA topic model to deal with each unit corpus.We use Hierarchical Clustering Algorithm to cluster topics and calculate the topic heat.The results show that this method can really extract the hot topics on microblogging platform effectively.Finally,the thesis summarizes the research work and defines the future research direction.
Keywords/Search Tags:Microblogging Short Text, LDA Topic Model, Clustering Algorithm, Hot Topics
PDF Full Text Request
Related items