Font Size: a A A

Research On Cross Modal Text Generation Image Based On Generative Adversarial Network

Posted on:2024-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:C HuFull Text:PDF
GTID:2568307067963279Subject:Engineering
Abstract/Summary:PDF Full Text Request
Deep learning technology has made significant progress in multiple computer vision tasks in recent years,including text image generation.This task also involves text modality,which is combined with the field of natural language processing.The purpose is to generate text matched,high fidelity images from a given text description.With the impressive performance of generative adversarial networks in image generation tasks,they have gradually become the mainstream solution for text generation image tasks.However,due to the limitations of generating adversarial networks themselves,there are problems such as training difficulties and mode collapse;At the same time,there is a huge semantic gap between text and image modalities,which leads to difficulties in joint distribution learning and insufficient fusion,ultimately resulting in low quality and semantic deviation of the generated images.Based on existing problems,this article proposes two text generation image algorithms based on stacked network generation adversarial networks.The summary is as follows:This paper proposes a Shuffle Attention Generative Adversarial Networks(SA-GAN)that integrates unordered attention to address the issue of low color brightness and lack of correlation between RGB channels with text image concatenation features in synthesized images.The lightweight unordered attention is introduced into the generator’s text and image feature concatenation to seek channel correlation between text and image vectors.Secondly,the perceptual loss extracted from the first 35 activation layers of VGG19 is introduced as an auxiliary constraint to improve the true perception of the image.The experimental results showed that on the CUB dataset,the IS value reached 4.02,which increased by 8.6% and 0.49%compared to the Stack GAN-v1 and Stack GAN-v2 models,respectively.The FID value reached 47.31,which increased by 8.8% compared to the Stack GAN-v1 model.On the Oxford dataset,the IS value reached 3.07,which increased by 4.3% and 5.8%compared to the Stack GAN-v1 and Stack GAN-v2 models,respectively.The FID value reached 49.46,an increase of 10.5% compared to the Stack GAN-v1 model.In response to the issue of not paying attention to the weight differences of different semantic words and the need for further improvement in text image fusion methods,this paper proposes a Memory Gate Attn GAN(MG-attn GAN)that integrates memory gate attention modules,continuing to use algorithm one’s unordered attention and perceptual loss.Using a three stage stacked structure generator to increase image resolution to 256×256.Design a memory gate attention module to initialize the weights of different attribute words,and perform similarity matching and fusion with image sub regions.Apply spectral normalization to the discriminator to stabilize its density ratio in high-dimensional space,thereby stabilizing the training of GAN.The experimental results showed that on the CUB dataset,the IS value reached 4.61,which increased by 5.7% compared to the baseline model Attn GAN.The FID value reached 19.58,an increase of 18.3% compared to Attn GAN...
Keywords/Search Tags:Generative Adversarial Network, Semantic Gap, Shuffle Attention, Memory Gate Attention, Deep fusion
PDF Full Text Request
Related items