Font Size: a A A

Research On Initialization Of Neural Machine Translation System Without Using Parallel Sentences

Posted on:2023-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2558306629475454Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The main research content of this thesis is the research on initialization of neural machine translation system without using parallel sentences.Unsupervised training method is main method among the methods of training without parallel data.It uses denoising autoencoder and back translation to train on monolingual corpus.However,related research shows that the initial parameters of the model often have a great impact on the final effect of this kind of training method.Wrong initialization may lead to too much noise during training,which will affect its final performance.To solve this problem,this thesis proposes to get better initialization model from the perspectives of real dictionary supervision,pseudo dictionary supervision and meta learning initialization,and then use unsupervised methods to train machine translation model to improve the effect of machine translation.(1)From the perspective of real dictionary supervision,this thesis proposes a model initialization method based on real dictionary supervision.By using a real dictionary to replace the source monolingual data,the relationship between the two languages is established to supervise the training of the model,and two pretraining models from source to target and from target to source are obtained.And then using word embedding fusion initialization and dual-encoder fusion training to extract the knowledge learned in the pretraining model and initialize the unsupervised training model.The experimental results show that the initialization parameters obtained by real dictionary supervised training can effectively improve the effect of translation model.(2)From the perspective of pseudo dictionary supervision,this thesis proposes a model initialization method based on pseudo dictionary supervision.Experiments show that using dictionary to guide pretraining can get a better initialization model and improve the quality of translation.However,in some cases,parallel dictionaries are also difficult to obtain.So this thesis first extracts the pseudo dictionary from monolingual data,and then uses the extracted pseudo dictionary to initialize and train the model.In order to achieve this goal,this thesis proposes a two-step training method.In the first step,we use monolingual data to train an unsupervised model,and use its trained word embedding vector to extract pseudo parallel dictionaries.In the second step,we use the extracted dictionary to replace the training data,and then use the replaced data to train the unsupervised translation model.The experimental results show that the dictionary extraction method proposed in this thesis effectively improves the translation effect compared with the original translation model.(3)From the perspective of meta learning,this thesis proposes a model initialization method based on meta learning.Meta learning method can use the learning of different tasks to get a better initialization parameter,and a better initialization parameter plays an important role in the final training effect of the model.Using the monolingual data of multiple languages,this thesis obtains a better initialization model by using the meta learning method,and then continues to train the pretraining model based on this model.after that this thesis initialize unsupervised training model with pretraining model and uses the method of extracting dictionary for dual-encoder fusion training.Experiments show that the meta training initialization method proposed in this thesis has significantly improved the effect of the final training translation model compared with the traditional method which trains the pretraining model with random initialization parameters.
Keywords/Search Tags:neural machine translation, initialization, dictionary, meta learning
PDF Full Text Request
Related items