| Duality broadly exists in artificial intelligence and machine learning tasks,e.g.,Chinese-to-English translation vs.English-to-Chinese translation,image classification vs.image generation and speech recognition vs.speech generation/synthesis.Duality refers to the symmetricity or probabilistic relationship between two tasks.Although many tasks are associated with duality,such an important property is not widely adpoted in current machine learning schemes.The machine learning models of two dual tasks are still independently trained.To leverage such an important property,this thesis proposes a new learning scheme,dual learning.Dual learning utilizes the duality between the two tasks as constraints to jointly train the two models and eventually improves performances of the two tasks.Considering deep learning techniques have made significant progress in natural lan-guage processing and image processing,in this thesis,I choose deep learning models as experimental tools.This thesis studies dual learning from three aspects:learning schemes,theoretical guarantee and experimental analysis.In terms of learning schemes,at the training phase,when labeled data is limited,by using unlabeled data,this thesis proposes dual unsupervised learning,which can control the usage of unlabeled data,and the models’ qualities are improved by the proposed method.On the other hand,as for supervised learning techniques,this thesis proposes dual supervised learning through bridging the two dual tasks with probabilistic duality,which can improve the models too.At the inference phase,by introducing duality,this thesis proposes dual inference and improves the performances again.The above three schemes can be categorized as leveraging data-level duality,which means to operate on the data and to introduce duality by changing the loss functions.Correspondingly,based on the duality between different parts of the models,this thesis proposed model-level dual learning.By sharing the parameters,model-level dual learning can solve a pair of two dual tasks with a single model only,and the performances can also be boosted.In terms of theoretical guarantee,this thesis designs a theoretical framework for dual learning,and the Rademacher complexity is leveraged for analysis.This thesis gives preliminary theoretical analysis to demonstrate that dual learning enjoys better generalization ability.In terms of experimental analysis,this thesis verifies the ability of dual learning on neural machine translation,image processing and sentiment analysis.Considering the importance of neural machine translation in dual learning,this thesis also proposes a new neural machine translation model,deliberation network,which is further combined with dual learning in experiments.Deliberation network distinguishes from the stan-dard encoder-decoder based model by explicitly polishing the sequence to a better one.In neural machine translation,the proposed methods obtain promising results.Partic-ularly,on WMT17 Chinese-to-English translation task,our proposed methods achieve the state-of-the-art result among all single models.Based on dual learning,in image classification,the error rates of classifiers can be greatly reduced and in image genera-tion,a best pixel-by-pixel image generator can be obtained.In sentiment analysis,after applying dual learning,the classifier achieves greater accuracy and the generator can produce sentences with richer sentiment. |