Design And Implemention Of Focused Crawler To Application Store

Posted on:2019-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:J B Han

Full Text:PDF

GTID:2428330590459961

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The amount of information on the Internet is growing at an explosive rate,but the cost and time it takes to get the information they want is more and more high.Therefore,for a general search engine,after submit processing,the amount of information returned too big,need through the artificial intervention in order to again confirmation and investigation,the query returns the useful information to occupy the proportion is too little.At the same time,the general search engine in search depth often enough.In order to be able to timely understand the application of the store's hot information,the need to introduce the appropriate analysis system.In this paper,we design and implement a focused crawler system for application shops.Aiming at the service of the traditional web crawler,it can't meet the needs of the application of the search engine.This paper analyzes the related characteristics,architecture and workflow of the focused crawler,and puts forward a general framework of the crawler system oriented to the application store.This system is developed using the currently popular Python language and PHP language.It adopts a standard B/S architecture and makes full use of this architecture's advantages in system operation,combined with the advanced technology and methods,realizes the multi thread management module,crawling strategy module and HTTP download module,Mobile terminal download module,text extraction die block,a hyperlink extraction module,the theme relevance judgment module management function module,Finally,according to unit tests and stress tests,the system is tested and analyzed,and the final test results are analyzed and summarized.Through the test of the crawler system,the test results show that the crawler system has good operation effect,and can provide data for the analysis system,and has achieved satisfactory results.

Keywords/Search Tags:

topic crawler, multi thread management, crawling strategy, text extraction module

PDF Full Text Request

Related items

1	Research On Topic Crawler Of Combining Content With Link Structure
2	Design And Implementaion Of Service Crawling And Analyzing Module
3	Design And Implementation Of Crawler System For Food Contact Material Safety
4	Research And Implementation Of Web Information Detecting System Based On Topic Strategy
5	Crawling Search Strategy Subject-oriented Research And Realized
6	Design And Implementation Of Web Crawler For Given Page
7	The Design And Research Of Topic Web Crawler In Vertical Search Engine
8	Research And Implementation Of Focused Crawler Based On Word2Vec
9	Research On Web Crawler Algorithm Based On Topic Strategy
10	Research On Efficient Web Information Crawling Strategy