Font Size: a A A

A Sentence-level Quality Estimation For Neural Machine Translation Based On Subword Regularization

Posted on:2020-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y XiangFull Text:PDF
GTID:2415330575465051Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In today's era of increasingly frequent international exchanges,machine translation has alleviated the barriers that people in different countries and regions face due to language differences in information exchange.The quality estimation of machine translation is to estimate the quality of machine translation without manual reference.It plays an important role in machine translation and automatic post-editing of machine translation.Based on the previous researches,this paper proposes a unified neural network model(UNQE)based on sub-word segmentation for sentence-level quality estimation task.The model consists of two parts: one is a bidirectional recurrent neural network(RNN)encoder-decoder model,which can be considered as a feature extraction model;the other part is a RNN model for calculating the quality estimation score,which can be considered as a supervised regression model.When training,the bidirectional RNN encoder-decoder are initialized and pre-trained with the bilingual parallel corpus,and then,the networks are trained jointly to minimize the mean absolute error over the quality estimation training samples.Secondly,we propose a neural translation quality estimation method that combines different sub-word segmentation methods.In order to overcome the adverse effects of excessive vocabulary on the construction and training of neural machine translation models,in recent years,scholars have proposed BPE and SentencePiece sub-word segmentation methods,which greatly improve the quality of machine translation,but there is no research work.Investigate the impact of different sub-word segmentation methods on the quality estimation of machine translation.On the basis of in-depth analysis of the advantages and disadvantages of BPE and SentencePiece subsegmentation methods,we propose a method of neural translation quality estimation based on fused word segmentation,BPE sub-word segmentation and SentencePiece sub-word segmentation.We validate the proposed two methods on sentence-level translation quality estimation tasks of WMT17 and WMT18.The experimental results show that the proposed method significantly improves the performance of translation quality estimation.In addition,we use the proposed joint neural network model to participate in the WMT18 sentence-level translation quality estimation task.In the official evaluation results,the joint neural network model ranks first in statistical machine translation and neural network machine translation in English-Czech,English-Latvian and German-English in six sub-directions of machine translation evaluation.EnglishGerman Neural Network Machine Translation ranks first in the direction of machine translation with Ali's team,and third in the direction of English-German Statistical Machine Translation.
Keywords/Search Tags:quality estimation of machine translation, recurrent neural network, Sub-word segmentation algorithm, Encoder-Decoder architecture, joint training
PDF Full Text Request
Related items