Research On Domain Adaptation For Statistical Machine Translation Based On Topic And Semantic Analysis

Posted on:2019-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Liu

Full Text:PDF

GTID:2405330545951190

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Statistical Machine Translation(SMT)relies heavily on large-scale parallel corpus and builds statistical models by using computer’s strong computing power and machine learning algorithms.However,the performance of a SMT decreases when translating domain-specific texts.The reason is that the training data contains several domain information and from which the translation model will learn a variety of translation knowledge and linguistic phenomena.Thus,the translation system is incapable of adapting to the domain-specific semantic and language style.The research on domain adaptation for SMT aims at establishing a method for dynamically adjusting the translation model,to make it have strong ability to learn and process language features of the target domain,ensuring balanced and reliable translation capabilities of the translation system in different domains.We focus on the domain adaptation for SMT in this paper,including the following contents:(1)Domain relevant data selection based on topic informationWe propose a data selection method based on topic information in this research,which aims at extracting domain relevant sentence pairs from large-scale general-domain corpus to expand domain-specific training data and improve the performance of translation system.We utilize bilingual topic model to represent sentence pairs as topic distribution,and construct the mappings between topics and the target domain.This method introduces the underlying semantic information from topic perspective to better estimate the domain correlation of sentence pairs.Experimental results show that our methods increase the translation performance by nearly 1.64 BLEU points.(2)Reordering model adaptation based on topic modelIn this research,we prove that there exists significantly differences in phrase reordering distribution and propose a domain adaptive reordering model which fuses topic information.This research aims at solving the dynamic adaption problem caused by the domain unknown of the test set.Specifically,we analyze the topic information of the corpus and obtain the reordering distribution of phrases under different topics.When decoding,we infer the topic distribution of the test set,and utilize this topic distribution to weight the reordering distribution so as to optimize the reordering distribution of phrase pairs and enhance the performance of cross-domain SMT system.Experimental results show that the reordering model adaptation method can improve the performance by 0.76 BLEU points.(3)Terminology translation error identification and correctionIn this research,we propose a post-processing method of translation system to solve the problem of poor quality in domain terminology translation.We utilize the back translation strategy and convert the terminology translation identification into the quality evaluation of the back translation text.We use three metrics:language model perplexity of the back translation text,tree-edit distance and sentence semantic similarity.Experimental results illustrate that our method can effectively identify and correct the terminology translation errors.Experimental results illustrate that our method improves performance on both weak and strong SMT systems,yielding a precision enhancement of 0.48%and 1.51%respectively.

Keywords/Search Tags:

Statistical Machine Translation, Domain Adaptation, Topic Information, Translation Model Optimization, Terminology Translation

PDF Full Text Request

Related items

1	Domain Adaptation Of Statistic Machine Translation Based On Context Information
2	Research On Optimization Technologies For Decoding In Phrase-Based Statistical Machine Translation
3	Research On Optimization Of Language Model Based On Statistical Machine Translation
4	A Report On The Translation Of An Excerpt From Machine Translation
5	A Report On The Translation Of Handbook Of Natural Language Processing And Machine Translation(Excerpt From Chapter 5)
6	A Quantitative Study And Comparison Of Foreign Affairs Translation System,google And Baidu Machine Translation Systems In Translating Foreign Affairs Texts
7	Research On Key Problems In Turkish-Chinese Neural Machine Translation For The Military Domain
8	A Translation Practice Report On The Human-Machine Collaboration Translation Model—A Case Study Of The E-C Translation Of Psychology Of Learning For Instruction
9	Research On The Translation Method Of Hanyue Machine In Metallurgical Field
10	Russian-oriented Information Processing, Machine Translation Experiment