Font Size: a A A

Discovery Of Texts' Hot Topics Based On Improved TF-IDF

Posted on:2010-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z XueFull Text:PDF
GTID:2178360305498712Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the large number of network information, different information sources and continuously dynamic renovation, it's difficult to find interesting information from the massive network information for the people. The dissertation extracts the keywords based on the improved function of TF-IDF, and then makes clustering on the network news, which helps users to find the hot information from the massive electronic texts quickly.Synthetically considering the document categories factor, the location weight factor and the named entities weight factor, we improve traditional function of TF-IDF, and then design the keywords extraction flow from document based on the improved function of TF-IDF. The experimental results show that the accurate rate of the keywords extraction has increased by about 13.3%,the recall rate is about 13.1%,comparing the improved function of TF-IDF based on categories and location weight and named entities with the traditional function of TF-IDF.Use the improved function to extract the keywords from the background corpus, and then discover the hot topic of the test corpus by text clustering technology, the difference of effect is remarkable between the traditional TF-IDF function and the improved TF-IDF function. Experimental analysis shows that it has about 10% enhancement of the accurate rate and recall rate of hot topic discovery, when using the improved function of TF-IDF to extract the feature.The work of this text will be used widely in the aspect of hot topic tracking.
Keywords/Search Tags:Keywords Extraction, Hot Topic, TF-IDF, Location Weight, Named Entities Weight
PDF Full Text Request
Related items