Font Size: a A A

Design And Implementation Of A Spider For Topic-Specific Search Engine

Posted on:2009-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:J ShenFull Text:PDF
GTID:2178360245974715Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With high development of the Internet, people can access tremendous information, and ways of gaining information have also changed. It brings both opportunities and challenges to human being. As the Web resource grows exponentially, how to obtain information rapidly and accurately becomes an attractive issue. Topic-specific search engines (or specific information search engines) are introduced on this background.A search engine includes several modules: one of them is called Spider, which provides data source for search engines. However, as a Spider of a topic-specific search engine, the demand of processing specific information needs to be met. In this paper, we study and develop a Spider on following core concepts: (1) to draw on the experience of the time-switch strategy of CPU, we bring up the Site-depth-first Searching Model. It allows the Spider to grab web pages concentratively; (2) Introduce the Page-site Weighted Algorithm to download highly related sites first by using a weighted value to represent the relativity between pages or sites and the specific information; (3) Introduce a data structure of Two-dimensional Vector Workload to ensure Site-depth-first Searching Model and the time control of sites processing with weighted values.At last, we conduct an integrated running test to verify the feasibility of the system by using a chemistry-specific dictionary, and make a further analysis with the results. Then, we discuss the transformation from a topic-specific Spider to an universal search engine Spider.
Keywords/Search Tags:search engine, Spider, workload, weighted algorithm, Site-depth-first search
PDF Full Text Request
Related items