Font Size: a A A

A Study On The Disambiguation Of Combinatorial Ambiguities In Chinese Word Segmentation

Posted on:2003-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z J LianFull Text:PDF
GTID:2155360062996219Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The disambiguation of combinatorial ambiguities in Chinese word segmentation is still an unsolved problem till now. It's hard because this kind of ambiguities are closely related to the contexts in which they occur.This thesis analyses 615 combinatorial ambiguities collected from a 1.840,000-word corpus. On the basis of the analysis, two different approaches to the problem are proposed for dealing with two kinds of the cases:1) Decision list algorithm is deployed for the cases whose two segmented forms have even distribution in the text;2) Rules devised by humans are applied for tackling the cases whose two segmented forms have uneven distribution.22 typical examples are chosen in our experiment, including 17 evenly distributed and 5 unevenly distributed. The average accuracies for the two kinds of examples are 87.82% and 97.70% respectively.The error analysis shows that the performance can be improved further by applying some complementary means and using a larger scaled and more appropriate training corpus. The rules including those got by the decision list algorithm and those devised by humans can be applied in some applications, such as Chinese word segmentation and POS tagging.
Keywords/Search Tags:Chinese word segmentation, combinatorial ambiguity, decision list, collocational information, log likelihood, disambiguation rule
PDF Full Text Request
Related items