Machine Translation Quality Estimation Based On XLM-R

Posted on:2023-08-07

Degree:Master

Type:Thesis

Country:China

Candidate:S N Chen

Full Text:PDF

GTID:2555306902951219

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Machine Translation(MT)refers to the process of using a computer to convert one natural language into another.The BLEU value is extensively applied to evaluate machine translation output and is scored on human handcrafted references.However,Quality Estimation(QE)can automatically evaluate machine translation output without access to a humangenerated reference.This paper focuses on the crucial issues facing the mainstream QE model under the framework of deep learning,mainly including the following three aspects:(1)Research on enhancing semantic correlation of QE model.At present,it’s popular to introduce pre-trained models into translation quality estimation tasks.However,since multilingual pre-trained models are usually trained with monolingual corpora in different languages,corresponding vocabularies of different language pairs differ in semantic space.To address this problem,this paper proposes to introduce semantic association processing layer to QE model by integrating semantic similarity score between source and target texts.Experimental results show that the proposed method with enhanced semantic relevance under the concatenation mechanism can significantly improve the performance of MT quality estimation.(2)Research on improving QE performance by using data augmentation strategy.The corpora with annotations for training QE model are often expensive yet scarce in scale.Data augmentation is a direct and effective method to deal with this problem.In this paper,two different methods of data augmentation are proposed.One is the indirect data augmentation method based on the Dropout mechanism,which can randomly drop nodes to construct different representations for the same sentence,so as to exponentially increase QE corpus without constructing supervisory signals.The other is the direct data augmentation method based on the denoising autoencoder.The pseudo-translations as well as the corresponding supervisory signals are reconstructed according to the target-side texts of parallel corpus by the denoising autoencoder.Experimental results show that both methods can effectively improve the performance of MT quality estimation.(3)Research on integrating phrase alignment information into QE model.Existing works show that word alignment can effectively improve the performance of QE,but words are prone to ambiguity due to the lack of context,and the meaning of phrases is relatively clear.Therefore,it is highly possible to reduce negative effects caused by improper word alignment under constraints of phrase alignment.Therefore,this paper proposes to integrate QE model with phrase alignment probabilities which come from reliable bilingual phrases.Experimental results show that the proposed method can significantly improve the performance of MT quality estimation.To sum up,this paper describes a series of researches on machine translation quality estimation from the model,data and alignment granularity aspects.Experimental results on WMT public dataset show that the proposed methods in this paper are very effective and can reach or outperform the current state-of-art QE models.

Keywords/Search Tags:

Translation quality estimation, Enhanced semantic relevance, Data augmentation, Denoising Autoencoder, Phrase alignment

PDF Full Text Request

Related items

1	Research On Data Augmentation Methods For Chinese-Vietnamese Neural Machine Translatio
2	Research On The Key Technologies For Phrase-based Tibetan-english Statistical Machine Translation
3	Report On The Translation Of Journey To Data Quality
4	Research On Uyghur Speech Recognition Based On Deep Learning And Data Augmentation
5	A Sentence-level Quality Estimation For Neural Machine Translation Based On Subword Regularization
6	Research On Mongolian Handwritten Recognition Based On Data Augmentation And Correction Network
7	On The Parameter Estimation Of Cognitive Diagnosis Models And Its Application
8	The Neural Automatic Post-Editing Based On Quality Estimation
9	Chinese Machine Translation In The Prepositional Phrase Disambiguation
10	A Study Of Translation Quality In The Perspective Of Relevance Theory