Font Size: a A A

On Key Technologies For Pivot-Based Statistical Machine Translation

Posted on:2016-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2428330542489579Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Statistical machine translation(SMT),is also known as data-driven machine translation.SMT has been the most mainstream method for the reasons that this method not only can learn translation knowledge from large scale bilingual data without any manual interventions,but also has excellent performances.However,in a practical application,SMT is usually faced with poor parallel corpora resource for some language pairs.To this end,the researchers proposed pivot-based statistical machine translation.The key thought in pivot-based SMT is to indirectly connect with source language and target language using a third language(or more)as bridge.In this way,the third language needs to meet the condition that there are plenty of bilingual parallel corpus between both source-pivot and pivot-target.Generally,English is the most widely used pivot language.In conventional pivot-based SMT,there are three mainstream methods,respectively called corpus-level synthetic method,sentence-level transfer method,phrase-level triangulation method.Among these methods,the triangulation method is the most mainstream method due to good flexibility and high translation quality.However,there are some problems caused by conventional triangulation method,such as source phrases discarded,a mass of noise data in induced phrase table.To alleviate these problems,this thesis proposes a novel method based phrase-level transfer method and triangulation method.In our approach,the low confidence deductions are recognized by quality control factor when applying triangulation method.Moreover,the high quality but invalid phrases in these deductions are decoded again to produce relatively high quality translation rules.By this way,we can further improve the whole translation performance through improving the quality of translation rules and increasing the recall in translation table.Our approach is evaluated on German-Chinese translation task with English as the pivot language using millions of sentence pairs.Experiment results show that our method achieves significant improvement over baseline pivot-based method,which proves the validity in our approach.
Keywords/Search Tags:statistical machine translation, pivot-based statistical machine translation, pivot language, triangulation method, phrase table inducing, quality control factor
PDF Full Text Request
Related items