Font Size: a A A

The Research Of Microblog Topic Tracking And Realtime Retrieval

Posted on:2012-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:C H ShiFull Text:PDF
GTID:2218330368987859Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid spreading of SNS (Social Networking Services), more and more Internet applications have been becoming interactive and social, which makes the Internet even more diverse. Since 2006, the Twitter website has provided a service having the characteristics of both IM and SNS. The service is so called Microblog, which is a brand-new application of the Internet. Differ from the traditional blogging, this service only allows its users to write text-based posts of up to 140 characters. In other words, Microblogs allow users to share small elements of content such as sentences, images and video links. With its unique natures of realtimeness and openness, Microblog is fast becoming one of most popular application all over the world, and triggering an Internet revolution.With the feasible developer APIs, Users can post micro-blogs from the Web, mobile phone and other devices. A user can freely choose a number of people interested to follow such that he/she can receive what these people are doing or posting. A number of studies have shown that people usually update what they are doing, communicate with friends, share information and report news with micro-blogs.Based on the realtime and social natures of Microblog, there is a growing need of realtime search for users, which mainly includes two aspects:keeping watching interested topics and catching up on the latest relevant news. Based on this, the main work of this paper can be summarized as follows:First, with the real-time response characteristic of Microblog, people like to use it as a tool for publishing and receiving news events (topics) that occurred around them or the world, especially the top news. As many fresh topics are created in Twitter-like websites, there is an increasing need for the systems to track the development of these topics. In this thesis, we formally define the problem of topic tracking in the context of micro-blogs. In order to solve the problems of topic drift phenomenon and noise in micro-blogs, we propose an algorithm named Streaming Dynamic Topic Model, which improves Dynamic Topic Model with MEntropy, to track additional events on topics. Our algorithm can address the topic drift problem, and further reduce the noise in the results. In particular, MEntropy can be used to evaluate the importance of a micro-blog for tracking topic, and discriminate the Event micro-blogs and Neutral micro-blogs. In the experiments, we evaluate our proposed algorithm on a collection of 12 million micro-blogs from more than 170 thousand users, and show that our algorithm is more efficient, and performs better in reducing the noise compared with the traditional Dynamic Topic Model.Second, since a huge number of posts will be created in very short time, there is a growing need to retrieve relevance information in realtime. Differ from traditional retrieval, realtime retrieval emphasizes fresh contents. In order to address this problem, we extend the traditional IR methods by considering the time factor as the prior probability. We propose a query expansion algorithm, which takes in to account the posting time. Furthermore, we use a strategy which considers posts'quality factors including entropy and short link features, to re-rank the initial retrieval runs. Experiments on the Twitter corpus have show that our algorithm is effective to improve the retrieval performance, and the retrieval results can meet users'realtime retrieval need better.
Keywords/Search Tags:Microblog, Topic Tracking, Realtime Retrieval, Dynamic Topic Model, Relevance Feedback
PDF Full Text Request
Related items