Font Size: a A A

Research On Key Techniques Of Hot Topic Detection In Technology News

Posted on:2013-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:B YouFull Text:PDF
GTID:2298330392467948Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the volume of information isbecoming extremely large. As a traditional application since the Web1.0era, onlinenews is still being widely used. It is of great importance to organize the newsreports according to their corresponding events. The goal of topic detection is togroup together documents discussing the same event. When detecting hot topics intechnology news, several specific properties have to be taken into consideration:(1)The topics are in a relatively narrow range.(2)Events are more related than inother fields. Besides, in order for the users to get a comprehensive understanding ofthe current hot topics and access information of their interest more efficiently, anintuitive description of the events is required. To this end, this thesis mainly focusedon the following three aspects:First, this thesis proposed a method to model the news stories using topicmodel PAM(Pachinko Allocation Model), for on one hand keywords need to beextracted to generate event descriptions, on the other similarities betweendocuments are needed for further processing. Topic models are capable of extractingkeywords from a set of documents and generating document-topic distributionvectors, which can then be utilized to calculate similarities between documents.Then the effectiveness and superiority of the model was validated throughcomparative experiments with TFIDF and HHMM, and some analysis were givenbased on the experiments.Second, through the comparative analysis of three typical clustering algorithmsi.e., K-means, K-means++and Affinity Propagation, this thesis selected the AffinityPropagation algorithm, for it is more suitable for our specific application. Theclustering is essential for further processing could take advantage of groupingrelated news stories together and the frequent pattern mining algorithm has a highcomputational complexity, so clustering reduces computation greatly. Finally,results using different similarity measurements were analyzed to prove theeffectiveness of topic model on a different perspective.Third, This thesis proposed a method to generate event descriptions based on frequent item set mining algorithms. The clustering results need to be furtherprocessed due to the fact that the recall is satisfying while the precision and purityare relatively low. At the same time, the number of keywords in each individualcluster is reasonably small, making frequent item set mining algorithms applicable.The effectiveness of the method is validated through experiments on a collection ofnews stories from a time period. Besides, we show how the results can be adapted totopic tracking by comparing different aspects covered at different time points.
Keywords/Search Tags:hot topic detection, technology news, topic model, document cluster, frequent item set mining
PDF Full Text Request
Related items