Font Size: a A A

Research On Techniques Of Specialized Search Engine System

Posted on:2008-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhaoFull Text:PDF
GTID:2178360248451917Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With Internet rapid development and Web information continuing to explode in all directions, general search engines are up against unprecedented scaling challenges. In addition, general search engines provide service for all users, so the results from them are too exhaustive. Thousands of irrelative results obviously do not meet precise search needs. Therefore, specialized search engine which provides service in a single field emerged.Rather than collecting and indexing all accessible Web documents to be able to answer all possible queries, specialized search engines only gather part Web document which are relevant to the specialized field. As only related pages are gathered, accuracy and efficiency of specialized search engines have improved remarkably.The main purpose of the paper aims at researching techniques and actualizing corresponding modules for specialized search engine. Among aspects of information collection of specialized search engine, in order to achieve related pages, the paper gave some topic-relevant Web sets as the original URLs of spider, adopted deep-first strategy to crawl Web, and estimated the correlation of Web pages. The pages which be recognized as relevant would be downloaded and which be recognized as irrelevant would be discarded. Considering the different characters of specialized search engines comparing with general search engines, it designed the correlativity of information retrieval, and applied content of web pages based vector space model algorithm in the paper. Because the main information is Chinese, it needed to implement Chinese word segmentation. After analyzing and researching the techniques of Chinese word segmentation, the paper proposed an improved maximum matching method, and integrated word frequency statistic method to implement Chinese word segmentation. The experiment achieved excellent effect.Lastly, the paper implemented some modules of specialized search engine system based on information for computer hardware products, including spider, Chinese word segmentation and information filtrating, and accomplished collection of correlative information. The experimental results showed that it achieved good effect.
Keywords/Search Tags:Specialized Search Engine, Spider, Chinese Word Segmentation, Correlation Calculating
PDF Full Text Request
Related items