Now,as the rapid developmentof Internet,network information has an exponentialgrowth.The search engines have dominated position in the application of theInternet.On the one hand,general search engines bring convenience to users;on theother hand,they also give them troubles for the search results. Returning a lot ofinformation of search results is one of the specific performance. It takes a lot of timeto find the information they are interested for users.General search engines cannotconsider the professional demand for users,there is no difference in the returning ofsearch results brings a lot of inconvenience to the users.As a Vertical Search Engine,focus on a particular field. Today,there is a strongdemand for professional information when industry and social is graduallySubdividing.Vertical search engines can solve the problem of certain professionalinformation search.By using subject spider technology,vertical search engines aremore practical than general search engines in solving some professional issues.After the introductions of search engines and vertical search engines,this paperfocuses on analysis of Heritrix.By using the custom Heritrix to get the subject webinformation, and by introducing HLFHash algorithm, Heritrix can crawl the webMulti-threadedly.By eliminating the restriction of robots.txt, this algorithmaccelerates the crawling rate.In this paper,the Lucene is used to build index and retrieval.On the basis ofanalysis of Lucene basic frame structure,the words and sorting is changed.For theneed of electronic information search engine,based on professional electronicinformation dictionaries and statistical Chinese word segmentation algorithm wasdesigned and Lucene sorting algorithm was modified, making retrieval results moresatisfied with the needs of users.In addition, in order to build Lucene index, the paperalso analyzes and processes the downloaded website information content.Finally,experimental tests turn out the differences and advantages of verticalsearch engines, verify that the spider is reliable and efficient,and also verify theeffectiveness of analysis of the Chinese. Overall tests demonstrate that the system isreliable and practical,and have reference value to build a vertical search engine. |