Font Size: a A A

The Design And Implementation Of Website Analysis Module Based On Shopping Search Engine

Posted on:2011-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2198330335960352Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Search engines retrieve records that meet the requirements to users from the vast Web resources, according to people's query request. How to improve search quality and speed is becoming one of the hot topics at present. Search engine which is specific to a particular field, namely vertical search engine, reduces the search scope, and filters out irrelevant information. It's more professional, and could save you a lot of search time.The online shopping search engine studied by this paper is an example of vertical search engines, which is a specific application in the field of electronic commerce. This subject comes from the actual project of Alcatel-Lucent Sbell Co., Ltd, whose development environment is Linux.This subject has designed and implemented functions of Crawler, Parser, and Indexer of the online shopping search engine, which meets the requirements. Afterwards, this article gives a detail description on system analysis and design, and system implement. Finally, it briefly describes the work of system testing. Among that, this article focuses on web pages analysis and index query.Web pages analysis consists of five modules, namely, hyperlink analysis, static web extraction, web extraction exceptions, control modules, experimental test module. Its main task is to extract the hyperlinks in Web pages, and return to the Crawler; In addition, each page of the current page number taken back to the reptiles, for the reptile crawling strategy. Reptile can set the maximum number of pages crawl, compared with the current page number to decide whether to end the crawl.The web pages analysis section transforms semi-structured HTML data into structured XML data, using regular expressions to achieve the HTML content extraction and filtration; the intelligent search section extracts fields'values from XML file, to create the index, with the help of regular expression. In addition, the establishment of the knowledge database provides a basis for intelligent search.
Keywords/Search Tags:vertical search engine, web page analysis, regular expression, knowledge base
PDF Full Text Request
Related items