Font Size: a A A

Research And Implementation Of Vertical Search Engine Based On Lucene/Http Client

Posted on:2012-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:R R LiFull Text:PDF
GTID:2248330395455289Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the initial stage of the development of Internet, network resources are relatively scarce and it is relatively easy to finish an information searching task. With the rapid development of Internet, the information in the Internet is presented to be an exponential growth. Facing the massive of information resources, how to quickly and efficiently finish the users’ searching task has became a key problem to general search engine. At the same time, for just supporting amount of data and broad themes, the general searching engine become more and more unable to satisfy users’need about precise in a special thematic area. Therefore, vertical search engine-a search engine for the areas of expertise, come into being.In this paper, the concept, development prospects and characteristics of vertical search engines are firstly described, and then the theory of vertical search engine technology are introduced. The data used in this paper come from CNKI, which stroed almost all patent informations. By analyzing the HttpClient, an open-source toolkit and Lucene which also is an open source about indexing and retrieval framework mechanism, we gradually complete the establishment and perservation of data index of patent information about the field of computer application, which consists of the capture of patent informations, patent information extraction and extraction of keywords based on TFIDF. Finally, we construct a vertical search engine for the patent information searching in the field of computer application.Test experiments in this paper show that the vertical search engine about patent information built in this paper could meet the users’ needs on querying about patent information in the field of computer application.
Keywords/Search Tags:Vertical search engine, Lucene, HttpClient, Page extraction, TFIDF
PDF Full Text Request
Related items