Font Size: a A A

The Hot Topic Of Network Public Opinion Research And Realization Of The Auto-discovery Technology

Posted on:2013-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:X W LiFull Text:PDF
GTID:2218330374465426Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid growth of online information, users want to quickly obtain needed information from the vast amounts of network information resources, which tend to be very difficult. News websites on the Internet are of a large variety, and the same event reported by the different angles and positions, if the users only read the isolated information, it is different to get a comprehensive understanding and grasp the idea. So the users urgently need a tool to automatically discover social hot news topic and the related information, which are timely presented to the users.The article starts from the system function design of auto-discovering the internet hot topics, then analysis of the system development environment, the data processes and databases. The article in-depth study of the system's related technologies, such as information collection technologies, pretreatment of information technologies, finding hot topics technologies, and then describes the design and implementation of the three system modules. Finally, the group of experiments is to prove the efficiency and accuracy of the OICKM algorithm and OICKMSP algorithm.The main work of this paper is as follows:(1) Studying the system of the home and abroad is to design the system's functional structure of auto-discovering the internet hot topics;(2) Researching the information collection technology and information processing technology is to design and implement methods of the information collection module and processing module;(3) Focusing on the hot topic found technologies, such as the text module technology, feature selection technology, text clustering, hot topic assessment technology; after that analyzing the character of the classic K-Means algorithm, and combining with the importance of the news headlines, the author put forward the method of counting the frequency of the terms of the news headlines, which is to find out the center of each topic among the hot topics (the center of the topic is the document which has the most representative). the center of these topics as K-Means algorithm of the initial clustering center; this paper used the Single-Pass algorithm for the new reports to classify the topic, which reduced the K-Means the iteration number of algorithm. Finally, based on news reports, news topic representation model, combined with the improving cluster algorithm of this paper, the author designed the working process of finding the topic and the hot topic evaluation model.(4) The group of experiments is to prove the efficiency and accuracy of the OICKM algorithm and OICKMSP algorithms.
Keywords/Search Tags:Hot topic, Vector space model, cluster analysis, K-Means algorithm, Hot topicof assessment
PDF Full Text Request
Related items