A Study On Mongolian Pos-Tagging Corpus And The Related Technologies

Posted on:2012-11-06

Degree:Master

Type:Thesis

Country:China

Candidate:J X Wu

Full Text:PDF

GTID:2155330335972365

Subject:Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

In recent years, following the extensive use of statistical method in natural language processing, corpus linguistics has become a very active research direction in language study. The natural language processing based on corpus needs different levels of processing for the original corpus, thus we can get the desired linguistic knowledge.The tagging of morphology is the fundamental processing of corpus. In the stage of word processing in Mongolian information processing, the pos-tagging is the basis of the several follow-up works.For example, the statistic of vocabulary and suffix, the establishment of various dictionaries, sentence annotation and discourse labeling,all above need correct segmentation and tagging.Firstly, the thesis proofread the 200,000 words Mongolian annotated corpus and put forward solution for the existing problems. Secondly, improved the Mglex morphological analyzer using rules which is based on statistical. The experiment result indicated that the Mglex analyzer based on rules and statistical had better performance and satisfactory results of annotation. Using the 3rd-level annotated corpus with about 200,000 words as the training date, this model's disambiguation rate rises from 0.846 to 0.901;the model's accuracy rises from 0.935 to 0.977.

Keywords/Search Tags:

Mongolian, Pos-tagging, 200,000 words corpus, Mglex analyzer

PDF Full Text Request

Related items

1	Study On The Tagging Of Mongolian Corpus And Relative Methodlogy
2	Based On The Names Of Mongolian Corpus Automatic Identification
3	An Investigation Intothe Classification & Tagging Of The Error Corpus Insisted Foreign Students' Writings
4	Research Of Function Words In "the Corpus Of Spontaneous Mongolian"
5	A Study Of Korean Form For The Purpose Of Developing Suffix Analyzer
6	Modern Chinese Language Dictionary Pos Tagging Study
7	The Construction Of Integration Platform For-Mongolian Corpus Processing
8	A Study On The Construction Of Mongolian News Corpus And Related Issues
9	The POS Tagging System Of Ancient Chinese Function Words Based On Bi-LSTM-CRF
10	Research On Mongolian News Language Based On Corpus