Font Size: a A A

Design And Implementation Of News Topic Detection System Based On Incremental Clustering

Posted on:2019-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:R Y ZhengFull Text:PDF
GTID:2438330545493142Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology,a lot of news on the Internet is more and more explosive.They are different from the previous paper news,breaking the limitation of timeliness and universality of paper news.Internet news breaks the limitation of time and space,and covers all aspects of society and daily life.These areas include social sectors,financial sector,economic sector and sports sector.However,every reader has topics and reports of interest that they are interested in,and the news topics that individuals do not care about are automatically classified as noise.In order to save readers time,also in order to let everyone happy browse their interest in network news,this paper designs and implements a news topic detection system,the network news classification will be collated into similar news topics,users will be able to understand the topic,save time and improve the user experience.Based on this research background,we design and implement a news topic detection system based on incremental clustering.(1)First,determine the needs of the system in the preparation stage of the paper.The functional requirements of the software include news gathering,data processing,topic detection and data storage.Non functional requirements include availability,stability,ease of use,security,extensibility,and so on.On the basis of identifying needs,we consulted relevant literatures to understand the current research status of news topic detection at home and abroad,and the usability and defects of each research method.Through the research of these materials,we designed and realized the topic detection system.Finally,the system framework designed in this paper includes the following three parts: news information collection,news data processing,news topic detection.(2)next is the specific implementation.Many technologies are used in the implementation of the system,such as web crawler technology,webpage parsing technology and information extraction technology in news information collection.The technology of participle in the process of news text segmentation.In this paper the characteristics during the topic detection based on traditional single-pass clustering algorithm and considering the news itself,said the text uses the vector space model,and proposed incremental updating feature weighting thought,given the topic distinguish important words more weight,so as to improve the accuracy of topic clustering.(3)the design of this system using a suitable B/S architecture model,this architecture includes a data access layer,table type layer is the interface design andthe business logic layer of the three layer structure,the use of this framework has the following advantages: only need a browser to run the system,the client saves installation time,simplifies the user processes.Users can do business processing in real time,simple and quick;B/S architecture makes users and systems more interactive and needs to pay lower costs.The B/S architecture updates the server and does not need to update the operation of multiple clients.The improvement of the system is often achieved by improving the web page.This article uses the C#programming language and the SQL Server 2008 database to achieve the final system development.
Keywords/Search Tags:news, news gathering, incremental clustering, topic detection
PDF Full Text Request
Related items