Font Size: a A A

Research On Dynamic Clustering Method For Short Text

Posted on:2020-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhuFull Text:PDF
GTID:2428330596473192Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing popularity of online social media platforms and mobile Internet devices,the use of social software such as Sina Weibo and Twitter has become more widespread in people's daily life.Internet users generate billions of text data every day through these social platforms.These texts are short in length and their data characteristics change over time,which known as short text dynamic data streams.The mining analysis technology of massive short text data stream is of great significance for public opinion-oriented analysis,news hot topic tracking and personalized user interest mining.Due to the limited content length of short text,there is a problem of sparse data feature.At the same time,for the time-varying text data stream,the data characteristics change with time,which leads to the poor performance of the current dynamic clustering method.Therefore,effectively improving the dynamic clustering effect for short texts is the focus of text mining analysis technology.This paper mainly studies the dynamic clustering problem for short text data.From the perspective of topic transfer,discusses the effect of topic inheritance on dynamic clustering and new topics in clustering process.At the same time,considering the different types of text data,the strength of the topic inheritance is different.Therefore,this paper adjusts the strength of the topic inheritance to improve the dynamic clustering effect of short text.The main research work and achievements of this paper include:(1)Dynamic Dirichlet Multinomial Mixture(DDMM)model with new topic bias.The model considers the topic inheritance between time windows.The introduction of discount parameter in priors is weakened to some extent.The topic inheritance increases the possibility of new topics,so that the model can effectively generate new topics,and improving the dynamic clustering effect of short texts.Experiments show that the DDMM model can well capture the data characteristics of short text data streams over time,and give more reasonable results of cluster number dynamic changes.(2)A short text dynamic clustering topic model(DCTM)adjustable topic inheritance,considering the different topic inheritance strength of different types of data scenarios in dynamic clustering,and the topic inheritance adjustment strategy is formulated according to different scenarios,to realize the adjustability of topic inheritance in short text dynamic clustering.According to the clustering result estimated in the previous time period and the short text data feature in the current time period,the dynamic change of the short text topic is captured,and the adjustment factor is introduced in the prior to adjust the strength of the topic inheritance and improve short text dynamic clustering effect.The experimental results show that our proposed method can effectively improve the dynamic clustering effect of short texts by adjusting the topic inheritance.
Keywords/Search Tags:topic models in short texts, dynamic clustering, topic inheritance, DDMM model, DCTM model
PDF Full Text Request
Related items