Font Size: a A A

Research And Implementation Of Web Search Engine System Based On LUCENE

Posted on:2011-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:X C LuFull Text:PDF
GTID:2178360305981793Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The contents on the Web are increasing exponentially as the rapid development of the Internet. A problem how to obtain the useful information from vast contents quickly and accurately is facing us while people are enjoying the convenience of the Internet. The solver of this problem is Web Search Engine. Nowadays, Search Engine is one of the hottest topics in the field of Internet technology. The future Internet is content-oriented. People browse the Internet through the search results of search engines.Firstly, this paper does an in-depth research and analysis on the theory, framework, and data structure of a Web Search Engine. Meanwhile this paper discusses the future developing trend of search engine which includes individuation and intelligence. Different types of users search the same content will get different search results that are more suitable for users, and this is individuation of search engine; and intelligence of search engine refers to that search engine has self-learning function, can automatically adapt to user's query needs and classified users intelligently so as to provide foundation for its intelligence. Then, this paper describes the characteristics, system structure and indexing mechanism of LUCENE. LUCENE is an open source project of the Apache Software Foundation, which is implemented by Java completely, is fit for the application required full-text search capability, and has a good cross-platform capability.On the basis of the theory, use the Java technology, this paper implements a news search engine system. In this search engine system, the part of the network spider using a non-recursive crawling mode and Java multithreading mechanism, using a memory-based queue manager to take responsibility for operating the joining, distribution, handling and other operations of URL link in the process of web crawling, using thread pool to manage multiple crawling threads, concurrently crawl web pages. The implementation of index and search is with the Java class in the full-text search engine library of LUCENE. And then, use JSP (Java server pages) technology to design a simple news search engine clients.
Keywords/Search Tags:Search Engine, LUCENE, Web Spider, JAVA
PDF Full Text Request
Related items