Research And Implementation Of Focused Crawler

Posted on:2009-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:C H Kou

Full Text:PDF

GTID:2178360308977855

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The general search engine solves difficulties of people finding information on web to a large extent. However, with the development of information diversification, many shortcomings exist, such as low precision, low recall, content obsolescence and distribution unbalance. Focused search engine provides useful information and related service for a field, some people or a demand. Focused crawler is information collection part of focused search engine and it fetches some topic web pages in which users are interest. Therefore, focused crawler is paid more attention by researchers.The article analyzes the work principle and related difficult points and designs the architecture of focused crawler. Through the deep research to several classic focused collection strategy,â… propose a new strategy which is consisted of page topic judgment and Url topic forecast. With the classification technology, page topic judgment can compute the similarity between the topic and the page having already been fetched, and decide whether to save the page and hyperlinks or not. The Url topic forecast can predict the potential Url for the next crawl. The strategy is applied in the focused crawler. The parts of focused crawler, such as seeds injection, fetching, parse, text train, page topic judgment, Url updating and Url topic forecast, are realized.The results of experiment prove that the system runs stably and has a better harvest rate compared with common crawler. The application of focused crawler reduces time and space greatly. The advantage in time guarantees web pages updating timely. Furthermore, users get little redundant and useless information in retrieving because of single collection content.

Keywords/Search Tags:

focused crawler, crawling strategy, vector space model, parse, harvest rate

PDF Full Text Request

Related items

1	Research On Search Strategy And Key Techniques Of Focused Crawler
2	Design And Implementation Of Multi Information Web System Of Automotive Industry Based On Focused Crawler
3	Research On Focused Crawler Based On SVM Classification Algorithm
4	Research And Implementation Of Focused Crawler Based On Word2Vec
5	Research Of Focused Crawling Strategy
6	Research And Application Of Web Crawling Algorithm Based On Semantic Analysis
7	Research On The Topic Crawler Algorithm Based On Vector Space Model
8	Design And Implemention Of Focused Crawler To Application Store
9	The Focused Web Crawling Strategy Based On Incremental Learning
10	Research And Implementation Of A Combined Focused Crawler Based On Protocol-Driven And Event-Driven Crawling Techniques