Research And Implementation Of Enterprise Search Engine Based On Solr

Posted on:2014-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:X L Li

Full Text:PDF

GTID:2268330401488337

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet Technology, and the enterprise is growinginternationlization, making the competiton is more and more fierce between the enterprises.How to search information more timely and accurately in company website, is very importantfor an enterprise.Although large commercial search engines such as Google can achieveenterpeise internal search function, because of its commercial and popular, making this choicehas a lot of defects. Therefore, how to make effective use of those available technologies oflarge search engines, and build an enterprise website search engine more quickly and easily isthe important research subject.This topic is aimed at this kind of demand, and analysis the necessity and implementationmethod of enterprise search engine, this topic research the enterprise search engine based onSolr, firstly introduced the concept of the enterprise search engine and system architecture, andthen focuses on the theories and technologies in-depth analysis of the full-text indexingtechnology and information retrieval technology, especially researched on key technologies andclassics algorithm ideas releted to this paper system. At the same time, analyze and applicationthe Solr related technologies, such as MMSeg4j chinese word segmentation, Solrâ€™s java clientSolrj, and Solrâ€™s DataImportHandler handler. As Solr is an independent enterprise search engineapplication server, it package the luceneâ€™s code, and it is easy to use and powerful function. Onthis basis, combined with other key technologies such as web crawler, information extraction,chinese word segmentation, can build a small enterprise search engine system.Based on the above analysis, combination of basic theory and related technologies, in thispaper, I have implentmented a small enterprise search engine instance by using Solr. Accordingto the shortcoming of crawl web that encountered, this paper improves and extended theHeritrix framework by the instance feature, and success to download the specific web to local.Research the working principle of HTMLParser, and write code by web features to implementthe extract module, then stored the information into MySQL database. By studing the Solrâ€™sindexing and retrieval framework and the necessary configuration, then set up the Solr search engine server. Research MMSeg4jâ€™s four models, and add to Solrï¼Œby testing and statistic itâ€™ssegmentation accuracy rate reaching to98%.In view of enterprise data almost stored in database,by Solrâ€™s DataImportHandler handler, import the data to Solr, implement the purpose forenterprise. Research and improve Solrj source code to realize the search function. and designthe good user interface, and then implement a complete enterprise search engine. At last, testthe system in function and performance, and found this system with search high accuracy,real-time. And has good pratical value.

Keywords/Search Tags:

Enterprise Search Engine, Heritrix, Solr, Web crawler, Chinese wordsegmentation

PDF Full Text Request

Related items

1	Design And Implementation Of Solr-based Search Engine
2	Research And Implementation Of The Vertical Search Engine On Lucene
3	Search Engine Theory And Technology Research
4	The Design And Implementation Of Telecom Search Engine Based On Solr
5	Reseash On Some Key Technologies Of Enterprise Search Engine
6	The Design And Research Of Personalized Search Engine Based On Solr
7	Design And Implementation Of Vertical News Search Engine Based On Heritrix
8	Research On Key Technology Of Vertical Search Engine
9	The Study And Implementation Of Enterprise Search Engine Based On Solr
10	Design Of Search Engine Based On Lucene And Heritrix