| Neural networks are the core technology of modern data-driven artificial intelligence.Since birth,neural networks have expanded the boundaries of many areas.However,does the artificially designed architecture inspire the whole potential of neural networks? The fact is the designed architecture is limited by human taste and cognition,and nowadays has reached technical limitations.So automatic neural architecture search is gaining more and more attention.Neural architecture search(NAS)is a process of searching for the optimal architecture on the target dataset.First,NAS uses searching strategies to assemble neurons and connections in the predefined searching pool.Second,it trains and gets performance evaluation of the child networks.Finally,the evaluation value is fed back to adjusting searching strategies for optimal structures.This paper focuses on neural architecture search for neural machine translation tasks.Since neural networks are used for machine translation,this kind of end-to-end model has become a mainstream technology in this research area for its simplicity and efficiency.Traditional neural machine translation models usually pay attention to individual sentence translation,but when it comes to practical applications,the corpus rarely consists of single sentences,but context instead.A single sentence may influence others’ translation,and also be influenced by its context.To explore structures of context-aware machine translation models,this paper proposes a Contextualized Neural Architecture Search(CNAS)method,which aims to automatically find the best-performing models on context-aware translation tasks.The main innovative ideas of CNAS are as follows:(1)Firstly,CNAS uses a policy gradient method which is a reinforcement learning approach by setting an agent network to output network structures,to explore models with better performance and novelty.(2)Secondly,this research designs a contextualized modular searching pool.On one hand,this modular design of the searching pool helps to expand the searching space;on the other hand,adding in context-aware modules ensures that models CNAS searches for have good contextual relevance performance.(3)In addition,after realizing the shortcomings of evaluation standards such as BLEU on contextual machine translation,this paper introduces a context-aware evaluation metric.By this metric,CNAS can evaluate child networks produced by the agent network,and get evaluation value,which will be fed back to the agent network for producing more context-aware models.To verify the effectiveness of CNAS,this paper designs the following experiments:(1)Ablation study: To prove the context-aware evaluation metric CNAS chooses is better,this paper set up an ablation study to compare the differences between using BLEU and a context-aware evaluation score.The experimental results on the Open Subtitles dataset,show that the context-aware evaluation metric can accelerate the convergence of architecture searching and provide better guidance on the direction of searching context-aware models.(2)Comparison experiments: This research conducts experiments on WMT2014EN-DE and IWSLT2014EN-DE datasets,and respectively gets optimal models called CNAS_NET1 and CNAS_NET2.Comparing them with the models obtained by other architecture search methods,the results show that CNAS models are highly competitive in both datasets with improvements of 0.6 and 0.7.Architectures of CNAS_NET1 and CNAS_NET2 are also analyzed to uncover novelties in design,laying the foundation for subsequent research to design better network architecture.(3)Context-aware exploration: This paper also selects a sample from the test dataset IWSLT2014EN-DE to verify the contextual relevance awareness of CNAS.The result shows that CNAS_NET1 which CNAS searches in IWSLT2014EN-DE understands the third-person referent relationship between English and German.The translation comparison between our model and the Transformer baseline model shows that our searched model learns context information better. |