Font Size: a A A

Research On Text Proofreading Method Based On The Analysis Of The Mongolian Syllable

Posted on:2020-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y CaiFull Text:PDF
GTID:2428330596492636Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text proofreading is one of the basic tasks of Mongolian natural language processing,the promotion of text proofreading will directly affect the orderly development of Mongolian information processing.In order to solve the problem of text errors in the use of traditional Mongolian,this paper proposes a method based on syllable analysis and confusion sets to implement Mongolian automatic proofreading of text,which combines statistical features and Mongolian morphological rules.Based on the Mongolian syllables,this paper makes a study of text proofreading.Firstly,this study established the syllable confusion sets by segmenting Mongolian syllables,and realized the automatic generation of Mongolian real-word confusion set,then the confusion set was supplemented and perfected manually.On this basis,this paper used web crawler to obtain the relevant network corpus,established a language model,and realized the text proofreading of Mongolian realword errors.Secondly,on the basis of syllables,this study combined Mongolian morphological rules and syllable language model to realize the Mongolian spellcheck process.Then,using the syllable confusion dictionary with statistical features and the normalization probability of confused syllables,this study realized the proofreading process of Mongolian non-word errors.On the increase,decrease and replacement of single character in Mongolian words,compared with the correction system based on the middle code,the performance of this method has been improved.Finally,this study improved the proofreading algorithm of Mongolian real-word errors by combining the content of Mongolian text.Then,this study realized the correction of Mongolian non-word errors and real-word errors by combining the above methods.Based on the characteristics of Mongolian syllables,this study establishes syllable confusion set and real-word confusion set to check and correct non-word and real-word errors in Mongolian.This experiment not only achieves good results,but also provides new ideas for Mongolian text proofreading.
Keywords/Search Tags:text proofreading, Mongolian syllables, confusion sets, N-gram model, statistical characteristics, context
PDF Full Text Request
Related items