Font Size: a A A

Research On End-to-end Neural Network Machine Translation

Posted on:2021-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:1368330623482171Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of economic globalization and the Internet,the machine translation technology has played an increasingly important role in promoting political,economic and cultural exchanges.Under the current development of artificial intelligence,it has important scientific research and practical value.In practice,due to the variability of language,the limited representation of semantic information and the lack of parallel corpus resources,the performance of the system have been restricted.End-to-End neural network machine translation model(nmt)is a hot topic and difficult point in current research.This dissertation aims at the specific problems that still exist in the end-to-end neural network machine translation model.First,we are mining of source language text data and uses appropriate text data representation models to express complex,high-level and abstract semantic information;then,with a large number of parallel corpora,the ability to annotate data sets is used to build a more effective end-to-end neural network machine translation model based on reinforcement learning on supervised algorithms;next,under low resource conditions,the end-to-end neural machine translation tasks uses transfer learning technology to prevent neural networks from overfitting during training and improve the generalization ability.Finally,if parallel corpus is extremely scarce but sufficient in monolingual corpus,we focus on the research of unsupervised machine translation technology,which will be the future research trend.The main research results of this dissertation are as follows:1.Word embedding model based on location information.In the end-to-end neural network machine translation model,the word vector(Word Embedding)model is used to describe the source language.As the initial value of the neural network model,when the neural network model is trained,word vectors are randomly generated.The ability of word vectors to express source language data directly affects the performance of the entire end-to-end neural network machine translation.Moreover,the accuracy of the word vector model is very dependent on the choice of corpus,and when training the word vector model,the setting of model parameters and the size of the corpus will affect the effect of the word vector model.In order to enable the word vector model to carry more semantic information,the neural network machine translation model converges to a better optimal solution and improves the performance of machine translation tasks.This paper proposes a position information-based(Position Weight CBOW,PW-CBOW)word vector model.Based on the CBOW model,this method transforms the input layer of CBOW and adds the position information between words in the source language.Through experimental verification,it is found that the end-to-end neural network machine translation system based on the PW-CBOW word vector model that in IWSLT2014 German-English,WMT14 English-German,WMT14 English-French machine translation tasks,models can get better translation performance.2.During the end-to-end neural network machine translation model training,the input of the decoder comes from the real distribution,but during the test,the decoder completely depends on the output generated by the model,thereby generating the problem of error accumulation.This dissertation focuses on the study of reinforcement learning and the application of reinforcement learning algorithms based on policy gradients and actor-critics(AC)in training neural networks to generate sequences.At the same time,this dissertation first proposes the deep reinforcement learning DQN(Deep Q-learning)algorithm combining deep learning and reinforcement learning,as well as the improved deep double-Q network(DDQN)algorithm and the competitive architecture Q network(Dueling-DQN)algorithm that application in end-to-end neural network machine translation system.Finally,the experiment verified the feasibility and effectiveness of the end-to-end neural network machine translation system based on deep reinforcement learning in the three machine translation tasks of IWSLT2014 German-English,WMT14 English-German,and WMT14 English-French.And analyzes the performance of the translation system based on different reinforcement learning algorithms.3.The end-to-end neural network machine translation model needs to use a large number of parallel corpora to train the neural network model,but there are very few parallel corpus data resources that can be provided for machine translation.In the case of low resources,the lack of training data,the neural network is not easy to converge,the stability is low and the generalization ability is weak.The domain adaptive method of transfer learning can use high-resource parallel corpus data to extract useful information that may be used when learning low-resource parallel corpus.The end-to-end neural network machine translation system based on transfer learning only needs a small number of labeled samples to significantly improve the generalization ability of the system.However,this domain adaptive transfer learning method is likely to cause overfitting problems in the training of neural network machine translation models,and it is difficult to converge during training.In this paper,knowledge distillation is used as a regularization method for constrained neural network models.It is applied to an end-to-end neural network machine translation system based on transfer learning to prevent the model from overfitting during training and improve end-to-end resources generalization ability of neural network machine translation model.The experimental results show that under low resource conditions,knowledge distillation-based end-to-end neural network machine translation system has better performance in IWSLT16 English-Czech and IWSLT11 English-Arabic machine translation tasks.4.The biggest problem of the end-to-end neural network machine translation system is the dependence on parallel corpus data sets.For languages where parallel corpus is extremely scarce but monolingual corpus is sufficient,end-to-end neural network machine translation tasks can be converted into unsupervised task.The unsupervised machine translation system first needs to learn the language model parameters,ie language model pre-training,from the monolingual corpus with sufficient,unlabeled,small languages or dialects.After the language model is pre-trained,the features are directly added to the unsupervised machine translation system.This paper proposes a pre-training method based on the Shielded Named Entity(NER-MLM)language model.This method is based on the BERT model random word shielding(MLM)method,adding the idea of shielding more directional named entities.In the experimental part,it is first verified that in WMT'14 English-French and English-German machine translation tasks,the performance of pre-trained unsupervised machine translation systems based on the shielded named entity(NER-MLM)language model is better than that based on the pre-trained MLM language model;then proved that in the absence of large-scale parallel corpus,unsupervised machine translation technology is a good solution to improve the performance of end-to-end neural network machine translation under low resource conditions;finally,it analyzes the linguistic reasons why the translation performance of English-French,English-German produces large differences.
Keywords/Search Tags:Word Embedding, End-to-End, Language Model, Transfer Learning, Reinforcement Learning, Unsupervised Machine Translation
PDF Full Text Request
Related items