| The amount of information on the Internet is growing at an explosive rate,but the cost and time it takes to get the information they want is more and more high.Therefore,for a general search engine,after submit processing,the amount of information returned too big,need through the artificial intervention in order to again confirmation and investigation,the query returns the useful information to occupy the proportion is too little.At the same time,the general search engine in search depth often enough.In order to be able to timely understand the application of the store's hot information,the need to introduce the appropriate analysis system.In this paper,we design and implement a focused crawler system for application shops.Aiming at the service of the traditional web crawler,it can't meet the needs of the application of the search engine.This paper analyzes the related characteristics,architecture and workflow of the focused crawler,and puts forward a general framework of the crawler system oriented to the application store.This system is developed using the currently popular Python language and PHP language.It adopts a standard B/S architecture and makes full use of this architecture's advantages in system operation,combined with the advanced technology and methods,realizes the multi thread management module,crawling strategy module and HTTP download module,Mobile terminal download module,text extraction die block,a hyperlink extraction module,the theme relevance judgment module management function module,Finally,according to unit tests and stress tests,the system is tested and analyzed,and the final test results are analyzed and summarized.Through the test of the crawler system,the test results show that the crawler system has good operation effect,and can provide data for the analysis system,and has achieved satisfactory results. |