| Lucene is a full-text search architecture. It has excellent indexing structure andsystem architecture, it also has advantages like high-performance and scalableinformation search library, but for the support of Chinese word segmentationanda varietyof text formats, it is very inadequate. There are many Chinese word segmentationsystemthat Lucene is using, like StandardAnalyzer and CJKAnalyzer which the Luceneprovidesitself or ChineseAnalyzer and IK_CAnalyzer which third-party provides or othervariety of Chinese word segmentation system. StandardAnalyzer is based on the separateword segmentation which word as a unit for Chinese text segmentation. Itsdisadvantagesis that it needs complex word matching algorithms and a lot of CPU computing.CJKAnalyzer and ChineseAnalyzer are both using dichotomywhich two words as oneword segmentation. IK_CAnalyzer word segmentation technology is based on wordsegmentation dictionary, and adopted a unique positive iteration the most granularsegmentation algorithm and the analysis of fertility processor mode. At present, theLucene search engine did not realize Chinese word segmentation method based on theunderstanding, because the computer will not recognize each word in different context ofmeaning, so we still no word segmentation method based on the understanding the actualapplied effect.For the shortcoming of Lucene’s Chineseword segmentation, especially the lack ofChinese word segmentation technology in the field of understanding, this paper discussesChinese word segmentationthe based on BP (Back Propagation) neural network algorithm.But the BP neural network has a slow convergence speed and is easy to fall into localminimum value and has the low speed, we put forward an improved Particle SwarmOptimization algorithm (PSO, a Particle Swarm Optimization) to optimize the BP neuralnetwork which named PSO-BP neural network, and applied it in Chinese wordsegmentation, compared with the traditional BP neural network, can be concluded that PSO-BP neural network not only solves the defects of slow convergence speed of BPneural network, but also improve the precision of segmentation.Then, in this paper, the Lucene API for third-party Chinese phrase a splitter hascarried on the systematic research and analysis, the BP neural network optimized by PSO,Chinese word segmentation technology successfully applied in Lucene, and comparingwith Lucene’s Chinese word segmentation technology itself, concluded that thetechnology is superior to Chinese word segmentation technology itself.Finally, this paper designs and achieves a search engine using Lucene withcomponent of Chinese word segmentation based on PSO-BP neural networkso so as torealize the intellectualization of search engine in Chinese word segmentation, provides agood platform for follow-up work and research. |