Font Size: a A A

Design And Research Of Network Spider

Posted on:2014-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:M L ZhaoFull Text:PDF
GTID:2268330401964607Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Network resources are very rich, but how to effectively search for information is adifficult thing. Create a search engine is the best way to solve this problem.Multi-threaded web crawler program is the first algorithm in accordance with thewidth from the specified Web page to parse, search, and to crawl each URL to search,save and new entrance on the Internet constantly crawling the URL is automaticallyRunthe daemon.Web crawler application socket socket, regular expression, the HTTP protocol,windows network programming and other related technology, the Web crawler is a runin the background to the configuration file as the initial URL, down to crawl to thebreadth-first algorithm, save target URL of the network program in C++language asthe implementation language, and in VC6.0debugging by ordinary users be able toperform web search task.This thesis first details the system architecture of the Internet-based search engines,and then provides details on how to design and implement search engine search engine-Web crawler. Of the subject completed the following work:1. Complete analysis of Web crawler SPIDER architecture;2. To complete the design of the main function module;3. My SQL database;4. URL parsing queue management;5. To achieve the design of the individual function module;6. Carried out the testing of the system of network reptiles.In addition, in the Design and Implementation of Web crawler chapters in addition tothe detailed elaboration of the technical core combined with the realization of themulti-threaded web crawler code to illustrate, and easy to understand.
Keywords/Search Tags:SPIDER, Breadth First Search, multi-threads, Internet searchengine, Network spider, URL Captures
PDF Full Text Request
Related items