Font Size: a A A

Research And Implementation On English-Chinese Personal Name Transliteration Methods

Posted on:2010-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:M L ZhouFull Text:PDF
GTID:2155360275959227Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine transliteration is a process of mapping the content in the source language into that in the target language according to its pronunciation.And the back-transliteration is an inverse process of transliteration.Machine transliteration/back-transliteration is an important problem in natural language processing,and plays an important role in machine translation and cross-language information retrieval tasks.After the discussion of the transliteration methods deficiency based on the phoneme,we implement two machine methods for English-Chinese personal names transliteration:based on grapheme transliteration and by the method of Statistical Machine Translation(SMT).We focus our research on the followings:1.We first introduce the transliteration framework based on grapheme,and then compare two transliteration models under this framework:Noisy Channel Model(NCM) and n-gram Transliteration Model(TM).We find the n-gram TM contains more information and so outperforms NCM.2.On the basis of the EM algorithm alignment,we propose a novel alignment method:the first syllable letter mapping,and conclude 7 mapping rules.After apply the two algorithms to the English-Chinese transliteration unit alignment and comparison of the effects of the two alignment algorithms on C2E and E2C,we get the conclusion:the first syllable letter mammping aligment algorithm finally induce more high transliteration precision.And so we take the first syllable letter mapping algorithm as the transliteration unit alignment method.3.We also make research on how to apply the viterbi algorithm to the C2E and E2C decoding.Under the Direct Orthography mapping(DOM) framework,based on the transliteration unit automatic alignment method of first syllable letter mapping,we implement an English-Chinese personal names bi-direction transliteration system using the viterbi decoding.4.Besides,we take a name as a pseudo-sentence,and treat the machine transliteration as a statistical machine translation.SMT exploits log linear model to combine various features.The experiment shows the machine translation method is more suitable for transliteration,since transliteration has no reordering of the phrases. Furthermore,log linear model is very convenient when adding new features into the model and so has greater improvement.
Keywords/Search Tags:Machine Transliteration, Noisy Channel Model, N-gram Transliteration Model, EM Algorithm, Statistical Machine Translation
PDF Full Text Request
Related items