Font Size: a A A

New Machine Translation Models Based On Improved Self-attention Mechanism

Posted on:2021-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:M X JiFull Text:PDF
GTID:2428330614965910Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Machine translation is a core task in natural language processing.Current neural machine translation models are mainly deep network models based on recurrent neural networks and convolutional neural networks.But using only recurrent neural networks and convolutional neural networks to process text has certain limitations.In recent years,the self-attention mechanism has shown superior performance in many areas of natural language processing.Therefore,the dissertation applies the self-attention mechanism to machine translation.At the same time,according to the characteristics of machine translation tasks,the traditional self-attention mechanism is improved and a new model is designed.The main contents and contributions of the dissertation are as follows:1.In the task of machine translation,self-attention mechanism has attracted widespread attention due to its highly parallelizable computing ability,which significantly reduces the training time of the model,and its ability to effectively capture the semantic relevance between all words in the context.However,unlike recurrent neural networks,the efficiency of self-attention mechanism stems from ignoring the positional structure information between the words of context.In order to make the model utilize the position information between the words,the machine translation model called Transformer,which is based on self-attention mechanism,represents the absolute position information of the words with sine function and cosine function.Although this method can reflect the relative distance,it lacks of directionality.Therefore,based on the logarithmic position representation and combining it with self-attention mechanism,a new model of machine translation is proposed.This model not only inherits the efficiency of self-attention mechanism,but also retains distance and directionality between words.The results shows that the new model,compared with the traditional self-attention mechanism model and other models,can significantly improve the accuracy of machine translation.2.Recently,in the field of machine translation,many new models combining self-attention mechanisms and recurrent neural networks have been proposed.Studies show that the performance of these composite models on machine translation tasks exceeds that of individual self-attention mechanism or recurrent neural network.Although the ability of the model to obtain structural information is enhanced by introducing more parameters.On one hand,the redundant information generated by excessive parameters may not necessarily enhance the performance of the machine translation system.On the other hand,excessive parameters may even reduce the translation efficiency.Therefore,the gated recurrent units network is introduced in this dissertation,which is combined with the self-attention mechanism to design a more efficient machine translation model.The model uses the "residual connection" mechanism to combine the results of the two encoders.The "residual connection" mechanism can retain the underlying structural information and pass it to the high-level encoder to solve the problem of gradient explosion and disappearance.Experimental results show that the underlying gated recurrent unit network effectively preserves the hierarchical structure information in the text and has a tight set with the semantic analysis of the self-attention mechanism.Compared with other models,this model has certain advantages on processing natural and artificial languages.3.A recent study shows that the self-attention mechanism usually focuses on independent words,but ignores continuous phrase forms,and phrases are considered an essential form in machine translation.Works on the status of machine translation research show that extending basic units from words to phrases can substantially improve the quality of translations,which indicates that the performance of neural machine translation systems can be improved by explicitly modeling phrases.In previous studies,no work has explicitly combined phrase modeling with hierarchical modeling.Therefore,the dissertation proposes granular refinement,and then enhances the model's ability to obtain local information by limiting the range of attention.Therefore,the method proposed in this dissertation has fewer parameters.The model is more lightweight and requires less computing power.Self-attention mechanism usually consists of a multi-layer encoder-decoder recursion,in which the upper layers tend to learn semantic information,while the lower layers tend to capture structural information and vocabulary information.Therefore,locality modeling is only applied to the bottom layer.That is,the bottom self-attention layer has the ability to sense distance and extract local information,while the high-level layer captures global semantic information independent of distance.Experimental results show that the new model proposed in this dissertation can significantly improve the performance of neural machine translation models.
Keywords/Search Tags:machine translation, natural language processing, recurrent neural network, convolutional neural network, self-attention, position encoding, logarithmic position representation, residual connection
PDF Full Text Request
Related items