| One goal of natural language processing is to discover a method for assigning a rich structural annotation to sentences that are presented as simple linear strings of words, meaning can be more readily extracted from a structurally annotated sentence than from a sentence with no structural information. Because it is well known that Chinese sentence consists of a sequence of Chinese characters, Chinese word segmentation became the first step of Chinese information processing. Moreover, it is the foundation of part of speech tagging, syntax analysis and semantic analysis. Word segmentation ambiguous and distinguish from unknown word into Chinese words are the two obstacles in the technologies of Chinese word segmentation, this dissertation focus on the research of the characteristics and disambiguation of ambiguous segmentation. Firstly, this dissertation presents the formalization description of Chinese word segmentation and its main ambiguous types;secondly, each of these two ambiguous and its disambiguation methods were thorough studied;finally, experimental results were given. To combination ambiguous segmentation, we acquire and optimize disambiguation rules list through corpus, then apply the rules to correct ambiguous segmentation. Compared with the rules created manually by language experts, the automatic acquiring rules are more objective, more comprehensive, and more saving, it is the future direction of computational linguistics research. To overlapping ambiguous segmentation, the disambiguation rules of each ambiguous class are acquired through corpus too, and correct the ambiguous segmentation. At the same time, this dissertation also used the methods of based on maximum probability arithmetic and based on search list to correct the Overlapping ambiguous segmentation and achieved good results. |