Font Size: a A A

Generative Adversarial Networks Based Style Controllable Text-to-Image Synthesis

Posted on:2023-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:H R LiFull Text:PDF
GTID:2558306914473324Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text-to-image synthesis is an emerging technology that takes an input text description and generates a semantic image with a generative model.With the development of generative adversarial networks,great progress has been made on text-to-image synthesis,with many excellent research results emerging.While existing researchers have focused on improving image resolution,diversity and generating images that visually match the semantics of the text,the exploration of the generation of specific styles of images based on text,still lags behind the practical applications.At present,mainstream text-to-image models generally stack multiple generative adversarial networks,in which sketches roughly reflecting the semantics of the text are generated in the low-resolution stage,and the sketches are then refined in the high-resolution stage to generate highquality images.In order to make the generated images more colorful and develop the practical applications of text-to-image synthesis,this thesis proposes a style controllable text-to-image synthesis algorithm based on the stacked generative adversarial networks,which achieves style transfer while completing the text-to-image task.To address it,the thesis splits the task into two subtasks:firstly,it explores the text-to-image synthesis method with high-quality,and secondly,it implements style transfer based on the high-quality image generated in the first stage.Specifically,the two aspects of our contributions are as follows.(1)A text-to-image model based on a stacked generative adversarial network with Text-Image Similarity Computation Module.This thesis proposes an improved method for the existing stacked generation model in three aspects:improving the matching degree between text and image,improving image quality and stabilizing the training process.In this thesis,BERT is used as a text encoder to enhance the feature representation capability of text.In order to improve the matching degree between text and image,we propose a Text-Image Similarity Computation Module.The text and image are firstly encoded with two neural networks respectively,and text and image encoding are then fused into a common embedding space through neural network.This part is used to calculate the matching degree between the generated image and the text in the low/high resolution generation stage respectively and guide the training of the generator.To enhance the visual effect and overall quality of the generated images,a self-attention mechanism is introduced in the low-resolution stage to extract the global information of the images,and a perceptual loss function is added to the generators;to highlight the images,fuzzy images are added as negative samples in the adversarial loss function of the discriminator.To stabilize the model training process,spectral normalization is used in the discriminator of the low-resolution generation stage and highresolution generation stage to constrain its convergence process.(2)A text-to-image with style prototype system is designed to visually demonstrate the effect of generating style images.In order to extend the style controllability of text-generated images,a style controllable text-toimage synthesis model based on adaptive instance normalization is proposed.This thesis implements style migration based on the model proposed in(1).Firstly,four different styles of illustration datasets are collected.Secondly,the image generation method proposed in(1)is used to generate high-quality images in the low-resolution stage,and a style encoder is designed in the high-resolution stage,and the style code is introduced by adaptive instance normalization in the high-resolution stage,and a style loss function is used to guide the model training.Finally,a pretrained style classifier is proposed for the quantitative evaluation of style migration.Experiments demonstrate that the text-to-image model based on a stacked generative adversarial network with fused semantic consistency proposed in this thesis has significantly improved the image clarity and image quality of the generated images on the CUB bird dataset and the Oxford-102 flower dataset,and outperforms existing models in terms of semantic consistency and visual effects.Moreover,with the proposed style controllable text-to-image model based on adaptive instance normalization,our model is capable to generate images with specific style from text.
Keywords/Search Tags:Generative Adversarial Networks, Text-to-image, Semantic Consistency, Style Transfer
PDF Full Text Request
Related items