Font Size: a A A

Research Of The Uyghur-Chinese Personal Name Transliteration Based On Statistics

Posted on:2013-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y H TanFull Text:PDF
GTID:2215330374966437Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Names translation is to receive the names of the source language as input andthen output the name of the target language. In the process of the names transliteration,to retain the same principle of source and target language pronunciation, which canadjust the source name of the language to conform to the language habits of the targetlanguage, is basically, automatic translation of the names is a very important part inmany cross-language applications. In recent years, people pay more and moreattention to the transliteration of names, especially when there is a huge difference ifthe form of the two language character sets in a transliteration. Although there havebeen many cross-language applications between Uyghur and Chinese, the automatictransliteration between the two languages still be lack of in-depth study.In this paper, we mainly conduct a more comprehensive analysis of the namestransliteration, and on this basis, we aim at studying the problem of Uyghur andChinese names transliterationl, the main contents are generalized as follows:⑴Within the framework of glyphs, using source-channel model, with2,200pairs of Uyghur-Chinese name to build the dimension names transliteration model,then using the Chinese side of a person's name to build the language model, whiledecoder test separately use200names to do the open-test and close-test. Experimentalresults show that the transliteration of names based on source-channel model is viable,but due to the impact of style frame, the transliteration particle size cannot be furtherrefined, so in this paper attempts to study under the framework of the speech in thelater.⑵Under the framework based on speech, using phrase-based statistical models,convert Uyghur into Latin at first, and then considering the characteristics ofLatin-Uyghur to segment the Uyghur in accordance with Latin character-levelsegmentation and syllable block-level segmentation; Then convert the Chinese into Pinyin, and according to the phonetic characteristics to divide the Pinyin intocharacter levels, initial and final levels and syllable blocks three cut modes; On bothsides of the transliteration of different ways of splitting into three units to conductexperiments. Experimental results show that, based on combinations of syllableblocks is more suitable for Uyghur-Chinese transliteration of names.⑶Under the framework based on speech, using maximum entropy models, themain idea is to define the characteristics of a template, and making use of it to extracton characteristics of information of the pairs of Uyghur-Chinese name, and at last,utilizing the template information to realize the transliteration of names. Aiming atcertain pairs of Uyghur-Chinese names to conduct the experiment, the results showthat the feature information extraction is subject to further study.This article has major research on the influence of different transliteration of theframework and different transliteration of cell division and features information to thetransliteration of Uyghur-Chinese names, and as well as the research for this articlelaid the foundation for the study of the other named entity, for example, place names,organization name and so on.
Keywords/Search Tags:Machine Transliteration, source-channel model, maximum entropymodel, EM algorithm, Statistical Machine Translition
PDF Full Text Request
Related items