| The general search engine solves difficulties of people finding information on web to a large extent. However, with the development of information diversification, many shortcomings exist, such as low precision, low recall, content obsolescence and distribution unbalance. Focused search engine provides useful information and related service for a field, some people or a demand. Focused crawler is information collection part of focused search engine and it fetches some topic web pages in which users are interest. Therefore, focused crawler is paid more attention by researchers.The article analyzes the work principle and related difficult points and designs the architecture of focused crawler. Through the deep research to several classic focused collection strategy,â… propose a new strategy which is consisted of page topic judgment and Url topic forecast. With the classification technology, page topic judgment can compute the similarity between the topic and the page having already been fetched, and decide whether to save the page and hyperlinks or not. The Url topic forecast can predict the potential Url for the next crawl. The strategy is applied in the focused crawler. The parts of focused crawler, such as seeds injection, fetching, parse, text train, page topic judgment, Url updating and Url topic forecast, are realized.The results of experiment prove that the system runs stably and has a better harvest rate compared with common crawler. The application of focused crawler reduces time and space greatly. The advantage in time guarantees web pages updating timely. Furthermore, users get little redundant and useless information in retrieving because of single collection content. |