Research And Implementation On English-Chinese Personal Name Transliteration Methods

Posted on:2010-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:M L Zhou

Full Text:PDF

GTID:2155360275959227

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Machine transliteration is a process of mapping the content in the source language into that in the target language according to its pronunciation.And the back-transliteration is an inverse process of transliteration.Machine transliteration/back-transliteration is an important problem in natural language processing,and plays an important role in machine translation and cross-language information retrieval tasks.After the discussion of the transliteration methods deficiency based on the phoneme,we implement two machine methods for English-Chinese personal names transliteration:based on grapheme transliteration and by the method of Statistical Machine Translation(SMT).We focus our research on the followings:1.We first introduce the transliteration framework based on grapheme,and then compare two transliteration models under this framework:Noisy Channel Model(NCM) and n-gram Transliteration Model(TM).We find the n-gram TM contains more information and so outperforms NCM.2.On the basis of the EM algorithm alignment,we propose a novel alignment method:the first syllable letter mapping,and conclude 7 mapping rules.After apply the two algorithms to the English-Chinese transliteration unit alignment and comparison of the effects of the two alignment algorithms on C2E and E2C,we get the conclusion:the first syllable letter mammping aligment algorithm finally induce more high transliteration precision.And so we take the first syllable letter mapping algorithm as the transliteration unit alignment method.3.We also make research on how to apply the viterbi algorithm to the C2E and E2C decoding.Under the Direct Orthography mapping(DOM) framework,based on the transliteration unit automatic alignment method of first syllable letter mapping,we implement an English-Chinese personal names bi-direction transliteration system using the viterbi decoding.4.Besides,we take a name as a pseudo-sentence,and treat the machine transliteration as a statistical machine translation.SMT exploits log linear model to combine various features.The experiment shows the machine translation method is more suitable for transliteration,since transliteration has no reordering of the phrases. Furthermore,log linear model is very convenient when adding new features into the model and so has greater improvement.

Keywords/Search Tags:

Machine Transliteration, Noisy Channel Model, N-gram Transliteration Model, EM Algorithm, Statistical Machine Translation

PDF Full Text Request

Related items

1	Research Of The Uyghur-Chinese Personal Name Transliteration Based On Statistics
2	Research On Optimization Of Language Model Based On Statistical Machine Translation
3	Research On Optimization Technologies For Decoding In Phrase-Based Statistical Machine Translation
4	Research On The Transliteration Model Of Tibetan And Chinese Names Based On Hybrid Strategies
5	A Report On The Translation Of An Excerpt From Machine Translation
6	Research On Domain Adaptation For Statistical Machine Translation Based On Topic And Semantic Analysis
7	Research Of Document-Level Neural Machine Translation
8	Research On The Key Technologies For Phrase-based Tibetan-english Statistical Machine Translation
9	Research And Implementation Of Neural Machine Translation Model Based On Fusion Of Dependency Syntactic Information
10	Assessing Quality Of Machine-generated Bilingual Subtitles