Font Size: a A A

Research On Methods Of Evolution Analysis Of Network News Events

Posted on:2014-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:J P JiangFull Text:PDF
GTID:2248330395499662Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology and the Internet, the main way people get news and information is gradually shifted from newspapers, radio, television and other traditional media to the Internet. The more and more increasing users learn about news through the Internet. Therefore, the Internet has become the main platform for dissemination of information. However, there is a large number of redundant news reported on the Internet, including repeated report of one event by one media and the duplication report of one event by different media. Under this circumstance, readers are often unable to start in front of a large number of redundant reports, so how to make the Internet news quickly understood by readers is a problem to be solved.This paper establishes offline corpus based on the Baidu search of news events from the reader’s point, and makes further analysis research on the relationship of these news events. The main research is as follows:(1) In this paper, evolution analysis methods of the Internet news events are studied. To the users, the elements of the Internet news events are their most concerned. The participation of the elements are changing with the developing of the events. Evolution analysis methods aim to find the relationship between events and show to readers by visualization software. This paper introduces multi-document summary and entity recognition, which of association mining techniques are applied in evolution analysis methods.(2) This paper presents duplicated pages deletion method based on characters statistics. It extracts statistically high-frequency characteristics on the webpage and calculates digital fingerprint of characteristic string. According to the size of the intersection between the digital fingerprint array, the methods can determine whether two pages repeat. In this paper, the effectiveness of the proposed method is verified by experiment and removal F-measure reaches94.91%. This paper completes the construction of the Internet news corpus based on the algorithm.(3) This paper presents timeline summary method based on elements extraction. News event has its basic elements, according to this feature, this paper makes extraction weighted processing of news report elements phrases and finishes transition probability matrix construction by the calculation of the sentence similarity. Finally, this paper makes the sentence extraction in the important node and completes timeline summary. According to the experiment on the Internet news corpus, the effectiveness of the method is verified, recall, accuracy and F-measure respectively reach about45%,35%,40%on average.(4) This paper studies evolution analysis methods of the Internet news events based on timeline summary and news elements extraction. Through the application of social network analysis technology and construction of event elements matrix, this paper visualizes changing of elements participation degree in news events.
Keywords/Search Tags:Evolution Analysis, Duplicated Webpages Removing, Element Recognition, Timeline Summary, Visualization
PDF Full Text Request
Related items