Font Size: a A A

A Research Of Encoding Conversion And Proof Reading Method On Mongolian Corpus

Posted on:2019-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y T N WuFull Text:PDF
GTID:2405330596456149Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,in the era of information modernization,transmission of information,and resource sharing become electronic and cyberized.Most of information is spread and shared in words.For Mongolian information,it is inevitable to adapting to the the developmental requirement of information age.As the development of Mongolian information processing,a variety of Mongolian coding have appeared,such as Saiyin,Mengkeli,Ming'antu,and Intelligent Coding.Among the various coding banks,the corresponding codes of Mongolian characters are different and mutually incompatible.If the corresponding Mongolian character bank is not installed,the Mongolian data in the computer will be displayed as messy codes and cannot be used.This would make Mongolian information resources unable to be spread,shared and studied.The most effective way to solve these problems is to convert the codes into unified codes.This thesis is comprised of two parts,coding conversion and coding proofreading.In coding conversion part,the author makes detailed analysis and comparison about Mengkeli and Intelligence Coding,which are widely used,as well as Mongolian International Standard Coding.Then Mengkel coding and Intelligence Coding will be converted into Mongolian International Standard Coding.Coding conversion is a way of conversion that is based on a rule of Mongolian deformed character sets and control character using.In the process of converting,the codes are categorized by scope of codes and form of codes in different places.After the code category is decided,if it is a Mengkeli code,it will be converted into standard code with an algorithm of converting from Mengkeli code into standard code.If it is an intelligence code,it will be converted into a standard code with an algorithm of converting from intelligence code into standard code.Non-standard Mongolian coding,for instance,Mengkeli and Intelligence coding,belongs to form code.But standard coding is phonetic code.When converted into standard codes,coding conversion can not fully conducted in a right way,there would be some wrong codes,because those codes are not correspondent to the International Standard Coding.In addition,typing sometimes produces wrong codes.Therefore,it is necessary to proofread the Mongolian international standard coding,which are produced from conversion or typing.In this thesis,the code proofreading is based on harmonious rules of Mongolian masculine and feminine vowel.The rule is that masculine and feminine vowels can not occur simultaneously in one word.That means if the first vowel occurred in a word is masculine,the following vowel occurred in this word must be masculine.If the first vowel is feminine,the following vowel appeared in this word must be feminine.Otherwise,the wrong codes are replaced by correspondent right codes.In the process of coding proofreading,whether it is a vowel or consonant is judged by the algorithm of judging vowel and consonant;whether the current code is feminine or masculine is judged by the algorithm of judging feminine and masculine.At last,the wrong codes are replaced by right codes with the algorithm of proofreading words.
Keywords/Search Tags:Mongolian character, Mongolian coding, code conversion, proofreading
PDF Full Text Request
Related items