Font Size: a A A

The Research On Chinese Word Segmentation Technology Based On Nutch Searching Engine System Data Processing

Posted on:2015-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y JiFull Text:PDF
GTID:2298330467461480Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the increasing development of internet,more and more people tend to useinternet to obtain information,therefore,information begins to be multivariate withvarious sources,while how to find the right information accurately with high speed tomeet the needs of different clients among so many sources? Search engine has greatimportance of coping with this problem while also with growing contradiction whichespecially remains in the search engine supporting Chinese word segmentation:Chinese words begin to be increasingly abundant because of the diversification ofinternet information and result in lots of cyberwords, completing the skill of Chineseword segmentation to meet the needs of searching Chinese words will take furtherimprovement in both area of theoretical research and practical application.Under the background in the era of network full of data,this paper first gives thebrief introduction of search engine from the view of cognition,then comprehensivelyanalyzes it from the theory of technical level in order to get comprehensiveunderstanding of search engine.The main research task of this paper focuses onChinese word technology in detail,including the great importance of its applicationused in search engine、its outcome assessment、its main difficulties、algorithm and soon.On the source project of search engine,this paper has chosen Nutch searchingengine system to do investigation which is more practically meaningful to theimportant research points,this paper also has given the overall analysis of itstheoretical principles as well as builded the developing environment to achieve Nutchsearching engine in technology. Finally,this paper analyzes Chinese word and Plug-inarchitecture contained in Nutch searching engine and compares the three results fromword segmentation method, the binary segmentation method and lexical dictionariestesting,this paper also has done the secondary development of Nutch searching engineby using Nutch Plug-in architecture and compared the differences of the effectsbetween the improved and not-improved Nutch searching engine.This paper has theoretically analyzed searching engine technology which takesNutch searching engine for example, then improves Nutch searching engine from atechnical point of view of Chinese word segmentation algorithm with achieving and comparing the Chinese word segmentation effects result from the improved andnot-improved Nutch searching engine,what the paper investigates will have certaintheoretical and practical meaning.
Keywords/Search Tags:searching engine, Nutch searching engine, chinese word segmentation, information, segmentation
PDF Full Text Request
Related items