Font Size: a A A

A Study On Mongolian Pos-Tagging Corpus And The Related Technologies

Posted on:2012-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:J X WuFull Text:PDF
GTID:2155330335972365Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
In recent years, following the extensive use of statistical method in natural language processing, corpus linguistics has become a very active research direction in language study. The natural language processing based on corpus needs different levels of processing for the original corpus, thus we can get the desired linguistic knowledge.The tagging of morphology is the fundamental processing of corpus. In the stage of word processing in Mongolian information processing, the pos-tagging is the basis of the several follow-up works.For example, the statistic of vocabulary and suffix, the establishment of various dictionaries, sentence annotation and discourse labeling,all above need correct segmentation and tagging.Firstly, the thesis proofread the 200,000 words Mongolian annotated corpus and put forward solution for the existing problems. Secondly, improved the Mglex morphological analyzer using rules which is based on statistical. The experiment result indicated that the Mglex analyzer based on rules and statistical had better performance and satisfactory results of annotation. Using the 3rd-level annotated corpus with about 200,000 words as the training date, this model's disambiguation rate rises from 0.846 to 0.901;the model's accuracy rises from 0.935 to 0.977.
Keywords/Search Tags:Mongolian, Pos-tagging, 200,000 words corpus, Mglex analyzer
PDF Full Text Request
Related items