| With the rapid development of the Internet Technology, and the enterprise is growinginternationlization, making the competiton is more and more fierce between the enterprises.How to search information more timely and accurately in company website, is very importantfor an enterprise.Although large commercial search engines such as Google can achieveenterpeise internal search function, because of its commercial and popular, making this choicehas a lot of defects. Therefore, how to make effective use of those available technologies oflarge search engines, and build an enterprise website search engine more quickly and easily isthe important research subject.This topic is aimed at this kind of demand, and analysis the necessity and implementationmethod of enterprise search engine, this topic research the enterprise search engine based onSolr, firstly introduced the concept of the enterprise search engine and system architecture, andthen focuses on the theories and technologies in-depth analysis of the full-text indexingtechnology and information retrieval technology, especially researched on key technologies andclassics algorithm ideas releted to this paper system. At the same time, analyze and applicationthe Solr related technologies, such as MMSeg4j chinese word segmentation, Solr’s java clientSolrj, and Solr’s DataImportHandler handler. As Solr is an independent enterprise search engineapplication server, it package the lucene’s code, and it is easy to use and powerful function. Onthis basis, combined with other key technologies such as web crawler, information extraction,chinese word segmentation, can build a small enterprise search engine system.Based on the above analysis, combination of basic theory and related technologies, in thispaper, I have implentmented a small enterprise search engine instance by using Solr. Accordingto the shortcoming of crawl web that encountered, this paper improves and extended theHeritrix framework by the instance feature, and success to download the specific web to local.Research the working principle of HTMLParser, and write code by web features to implementthe extract module, then stored the information into MySQL database. By studing the Solr’sindexing and retrieval framework and the necessary configuration, then set up the Solr search engine server. Research MMSeg4j’s four models, and add to Solr,by testing and statistic it’ssegmentation accuracy rate reaching to98%.In view of enterprise data almost stored in database,by Solr’s DataImportHandler handler, import the data to Solr, implement the purpose forenterprise. Research and improve Solrj source code to realize the search function. and designthe good user interface, and then implement a complete enterprise search engine. At last, testthe system in function and performance, and found this system with search high accuracy,real-time. And has good pratical value. |