Research On A Chinese Word Segmentation Method Based On Dictionary And Bayesian Theorem

Posted on:2013-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:W P Liu

Full Text:PDF

GTID:2248330392457251

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent decades, a number of commonly used Chinese word segmentationalgorithm has formed through the efforts of researchers of the majority of domestic andforeign experts and scholars. The main mechanical word segmentation algorithm basedon the lexicon, Chinese word segmentation algorithm based on understanding andstatistics-based Chinese word segmentation algorithm. These algorithms have theirrespective advantages and limitations.Under the analysis and research of these Chinese word segmentation algorithms, Idesigned a Chinese word segmentation algorithm based on the dictionary and BayesianTheorem. I build a dictionary which includes commonly used word and other featureswords. This dictionary could be update by the text of the corpus. The need of thealgorithm to quickly find the data is meet by using the Hash table and linked list datastructure to store the dictionary. Use Bayesian Theorem flexibly on the formula tocalculate the probability of word programs, according to the Chinese vocabulary in theword lexicon probability data to calculate the probability of the segmentation program.Use the binary model to resolve the ambiguity processing problem. This algorithm hasboth advantage of lexicon-based Chinese word segmentation and statistics-based Chineseword segmentation algorithm.After fully tested under enough test conditions, the test results show that thealgorithmâ€™s effect in dealing with ambiguity and processing unknown word is better thanother algorithms. This algorithm could meet the basic needs of a Chinese-relatedinformation system.

Keywords/Search Tags:

Chinese word segmentation, BayesianTheorem, Dictionary

PDF Full Text Request

Related items

1	Reverse Backtracking Research Of Chinese Segmentation Based On Last Word Dictionary
2	Research And Implementation Of Chinese Word Segmentation Algorithm
3	The Research And Implemenation Of The Chinese Word Segmentation System Combining Omini-Segmentation With Statistic
4	The Research And Implemenation Of The Chinese Word Segmentation System Combining Omini-segmentation With Statistic
5	The Research Of Chinese Word Segmentation Algorithm Based On Dictionary And Probability Statistics
6	Improvement And Implementation Of Chinese Word Segmentation Algorithm Based On Dictionary
7	The Research And Implementation Of Automatic Chinese Word Segmentation System
8	A Chinese Word Level Segmentation Algorithm Based On Document Category
9	Based On Dictionary And Word Frequency Analysis Of The Unknown Words From The Bbs Of Corpus Recognition Research
10	Chinese Word Segmentation Method Based On Dictionary And Statistics Of The Words