Font Size: a A A

Design And Implementation Of Core Word Extraction System In Search Engine

Posted on:2012-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2178330335950523Subject:Software engineering
Abstract/Summary:PDF Full Text Request
This thesis comes from the Core Word Extraction System in Search Engine of wireless iAsk team in Sina network Co., LTD. As one of the search engine's core systems, Index System needs to process huge amount of internet information. However, the high repeatability of information has caused great system resources waste. In order to solve this problem, it is necessary to add Detecting Duplicate Index Information System to Index System. The detect function is implemented by contrasting core words of informative text.In this thesis, the author has completed the work of designing and implementing Core Word Extraction System in Detecting Duplicate Index Information System. The purpose of the subsystem is to extract core word from informative text rapidly and accurately, which can be used by Detecting Duplicate Index Information System to achieve the objective of duplicate information detecting.In this article, the author first analyzed the value and worth of Core Word Extraction System in search engine to enterprise application. And then the related technologies of the system are introduced, including technology of search engine, Chinese word segmentation, pattern matching and development under Linux. After the research of related technology, the author analyzed the requirement of the system and presented a core word extraction scheme on the basis of Chinese word segmentation and pattern matching technology. Finally, the author completed the design of system architecture and functional modules, and then implemented the system.The Core Word Extraction System of search engine this thesis describes is mainly used by Detecting Duplicate Index Information System. In addition, the extraction system can be extended to article similarity calculation, web similarity calculation, news related words extraction and so on.Currently, the Search Engine Core Word Extraction System has been used in Sina wireless iAsk search engine. The system has improved user experience of iAsk search engine mainly in two aspects:The content of information becomes richer; Duplicate information in search result has significant reduction.
Keywords/Search Tags:search engine, core word extraction, Chinese word segmentation, pattern matching, weight calculation
PDF Full Text Request
Related items