Font Size: a A A

Research On The Text-to-Image Generation With Self-Attention Generative Adversarial Networks

Posted on:2020-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z F GuFull Text:PDF
GTID:2428330599453638Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,the research of artificial intelligence is at its peak,which is attributed to the breakthrough of deep learning technology in recent years,especially neural networks.In computer vision,deep learning has shown remarkable in various tasks,such as image classification or recognition,image segmentation and image semantic description,and shown several times performance better than machine learning.However,the problem of image generation is still a challenge today.The reason is that there have been few breakthroughs in the research of generation models,and it is even more difficult to control the neural network model to generate images conditioned on image categories or even text description.Generative Adversarial Networks provides a good solution of the Text-to-Image generation,and its performance has also improved in recent years.As we all know,Generative Adversarial Networks have excellent performance in image generation.The model of GAN is easy to implement and understand.However,while GAN benefits from its unique training mode,it's also constrained by its training mode.The majority of researchers have found in experiments that the original GAN has sometimes difficult to train and has the problem of mode collapse.Even in the latest work of text-to-image generation model based on GAN still have these problems,which lead to poor ability of image generation.For this problem,we have done many theoretical analysis and experimental work,and extended the text-to-image generation model based on the GAN,The main contribution of this paper are as follows:First,we improved the original GAN-CLS algorithm in terms of loss function,which define the JS divergence as the loss function.We used the approximate EM distance to replace the JS divergence according to the WGAN and WGAN-GP architecture.The JS divergence is easy to cause gradient disappearance in the training of the GAN.The approximate EM distance can fundamentally solve this problem.Therefore,this paper proves that the introduction of approximate EM distance can improve the training stability of the original GAN-CLS algorithm and avoid the mode collapse problem through theoretical demonstration and experiment.Second,we introduce the self-attention mechanism to GAN and propose GAN-SelfAtt to improve the quality of images in text-to-image generation.Meanwhile,we implement GAN-SelfAtt using two different GAN frameworks,i.e.,WGAN and WGAN-GP.The experimental results show that self-attention mechanism improves theresolution of generated images.The reason of this improvement is that the self-attention mechanism fixes the defect of convolution computation which only calculates the correlation in the local pixel region.
Keywords/Search Tags:Deep learning, text-to-image generation, Generative adversarial Networks, Self-Attention
PDF Full Text Request
Related items