Font Size: a A A

Research And Design Of Vertical Search Engine Based On The Field Of Education

Posted on:2015-11-16Degree:MasterType:Thesis
Institution:UniversityCandidate:ZhangFull Text:PDF
GTID:2298330434460723Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, information on the internet is alsoexploding. The information of growth was embodied in all fields of the Internet, especially inthe field of education. The Internet provides a wealth of online learning resources andteaching resources for the majority of net citizen. We can be very convenient to access anddownload these resources, however, it has become an important problem how to accuratelyfind the educational resources they needed in time.Traditional search engine will return a lot of results, but it’s very hard to finda very professional result they needed. With the emergence of vertical search engines, thesituation has been improved greatly. New vertical search engine came out. Comparing withthe general-purpose search engines, vertical search engines can solve most of the problemsthat general search engines can’t solve. They focus on specific fields, specific group of peopleand specific requirements.This thesis developed a prototype for vertical search engine in the field of educationusing Lucene as it’s underlying package by researching the search engine technology.Firstly, this thesis introduces the search background and the development home andabroad about vertical search engine.Discussing several key technologies about vertical searchengine, including web spider, web pretreatment, Chinese word segmentation and so on. In thesame time, we also introduce the core modules of Lucene, including indexing and retrievalmodule.Secondly, the thesis analyzes and designs the system structure by researching analyzeand design the system structure. The thesis studies the main content of topic crawler, and thefamous Topic Crawler algorithm contains Fish-search and Shark-search algorithm. Thisthesis modified crawler algorithm by analyzing its advantages and disadvantages. Implementthe application of modified and the algorithm of theme matching (VSM). This algorithmimprove the efficiency of the crawling but also improve the relevance of the grab web page.Then we have to denoise the web page to get the text content.Finally, this thesis implemented the vertical search engine based on educational field.Completeing index and retrieve module based on Lucene. This article using a new method tocreate index which only contains the keyword and title. After experimental verification, thismethod can reduce the size of index files and improve the efficiency of searching.
Keywords/Search Tags:Education Resources, Topic Crawler, Fish-search, Index
PDF Full Text Request
Related items