Font Size: a A A

Research And Implementation Of Topic Detection And Tracking Techniques For Microblog Event Streams

Posted on:2014-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:W L XuFull Text:PDF
GTID:2348330473451273Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an emerging Web 2.0 application, the microblog has become an important tool and platform for information sharing and dissemination. Microblog users can expediently publish, repost, share information with others through the fixed and mobile clients. Because of the fast spread and short content characteristics, the microblogs as the source of fast streaming short texts, not only become the space of rapidly spreading information, but also become the first spot of many important social events. Both the government functional departments and the microblog users have the requirement of knowing about network public opinion through microblog. This requirement includes not only the acquisition of dispersed microblog information, but also includes monitoring the hot topics and tracking the topics'subsequent development and evolution. According to this requirement, this thesis study on the topic detection and tracking techniques for microblog event streams. The study work includes hot topic detection and hot topic tracking in microblog event streams.In the part of hot topic detection in microblog event streams, according to the complexity of microblog event streams'data structure and content, this thesis first presents a method of microblog streams filtering. The filtering method can help us to obtain pure microblog data, and improve the time and space efficiency of topic detection algorithm. According to the characteristics of mircroblog data and microblog hot topic, this thesis presents the method of keyword extraction based on the word's frequency and timeliness. Furthermore, this thesis proposes the method of detecting hot topics based on frequent patterns mining in the extracted topic keywords. The proposed method can enhance the cohesion of keywords within the group. Finally, the method of topic merging based on short text clustering is proposed. Experiments show that the method of keywords extraction in this thesis is more suitable for the microblog compared with traditional method. The experiments in the real dataset verify the efficiency and effectiveness of the proposed methods.In the part of hot topic tracking in microblog event streams, it is observed that the hot topics in the microblog space are fast spreading and in evolution. This thesis proposes the method of topic tracking based on dynamic updating topic model. Firstly, this thesis models the topic and microblog data with time sequence and computes the similarity between the two kinds of models, by this means to choose related microblog to track. And then topic features are extracted using the DTM, which is a LDA model with time factor, and further the topic model is updated as the streams pass by. Finally, according to the abstractness of the topic model, this thesis proposes the method of representative microblog extraction, which can depict the dynamic development of hot topics, and presents these microblogs to the users in the straightforward form. Experiments show that the method of extracting topic features in this thesis can present the dynamic changes of topic development more effectively compared with the other methods, and can effectively discover the content evolution of the topic.
Keywords/Search Tags:microblog event streams, frequent pattern mining, topic tracking, dynamic topic model
PDF Full Text Request
Related items