Font Size: a A A

Research On Network Architectures For Neural Machine Translation

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2405330545997405Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning,the neural machine translation(NMT)based solely on neural networks has achieved great breakthrough,almost completely outperforming the traditional statistical machine translation on all translation tasks.Typically,NMT adopts an encoder-decoder framework to directly model the translation process,where an attention mechanism is integrated into the decoder part to capture translation correspondences between source words and target words.Under this framework,how to enhance the network architecture of different components(encoder,attention and decoder)so as to improve the extraction and transformation of sentence semantics has gained enormous attention from the research community.In this thesis,for each component,we propose a corresponding architecture-enhanced network to improve NMT's translation performance.The main contributions are summarized as follows:1.We propose a context-aware recurrent encoder.Existing recurrent encoder usually uses bidirectional recurrent neural network to encode source sentence,and employs simple vector concatenation operation to obtain source representations.Nevertheless,this vanilla operation implicitly assumes the independence of context information from different directions,which would reduce the accuracy of extracted source semantics.The proposed model involves a particularly designed two levels of hierarchy which integrates these contexts into unified source representations.Experiments on NIST Chinese-English and WMT English-German show that the proposed model significantly improves the translation quality and speeds up the training as well as decoding.2.We propose a recurrent neural network based attention mechanism.Current attention mechanism induces translation-relevant information via linear weighted summation with weights assigned uniquely to each source word.However,the underlying network is essentially a linear model.Within this model,intra-sentence dependencies across source words are almost ignored,making it inadequate to capture complex translation relations.The proposed model relies on complex non-linear recurrent unit to capture intra-sentence dependencies,and employs gates to dynamically detect translation-relevant source words.Experimental results show that the proposed model considerably improves the translation performance,and achieves excellent performance on long source sentences.3.We propose a latent variable based variational neural decoder.Most decoders in NMT are discriminative neural networks which can only depend on source sentence to perform translation.Whereas,one source sentence generally has several gold translations that differ in lexical usages and styles,all of which cannot be fully reflected by source sentence alone.To solve this problem,we propose a generative decoder that involves a latent variable to model target sentences.We employ variaitonal neural approach to inject target information into the posterior distribution of the latent variable,and further transform this information into the corresponding prior distribution through variational algorithm.The target knowledge inside the prior distribution is then utilized by the decoder to guide the next-word prediction.On NIST Chinese-English and English-German translation tasks,our model yields great improvements.The goal of this thesis is to design novel network architectures so as to enhance the modeling ability of NMT system.With respect to the encoder,attention and decoder,we propose a context-aware recurrent encoder,a recurrent neural network-based attention mechanism and a latent variable based variational decoder respectively.All these models produce rather encouraging translation performance.
Keywords/Search Tags:Context-Aware Encoder, Recurrent Attention Mechanism, Variational Decoder
PDF Full Text Request
Related items