In recent years,deep learning technology has ushered in a peak period of rapid development.It has been used to solve many complex problems in the field of image processing,such as image recognition,target detection,etc.And it has shown excellent performance far beyond traditional machine learning methods.The text-to-image generation task belongs to the interdisciplinary field of computer vision and natural language processing.Natural language text description is used as a conditional constraint to generate images with a high degree of matching with the text description and high-quality images to realize the interaction of two kinds of cross-modal information.The text-to-image generation task is of great importance in the fields of small-sample image data expansion,computer-aided mapping,criminal investigation and other fields.And it is the basis of many important tasks such as small-sample data expansion,target detection,and image recognition.At present,the generative adversarial network model has become one of the most outstanding methods in the field of image generation due to its excellent performance.However,because the text-to-image generation task and the generative adversarial network appeared relatively late,there are still many unsolved problems and challenges in this field,such as low quality of generated images,high model complexity,difficulties in model training,etc.Aiming at the main problems in this field,this paper proposes the following two model algorithms:First of all,in response to the problems of poor image quality,authenticity,richness,and text consistency,this paper proposes a multi-attention depth residual generation confrontation network based on the idea of a stacked generative adversarial network(Multi-Attention Depth Residual Generation Adversarial Network,MADR-GAN),which splits the text-to-image generation task into two tasks: initial image generation and image refinement.A deep residual self-attention module is proposed in the initial image generation process of the first stage of the stacked network,which can extract and maximize the retention of deep text features,while improving the quality,integrity and layout rationality of the initial generated images.In the image refinement stage,the introduction of dynamic memory modules and convolutional block attention modules(Convolutional Block Attention Module,CBAM)makes the network’s depiction of image details more detailed and realistic,and the authenticity and vividness are stronger.At the same time,it can better play the role of supervision of text information,and improve the consistency between the generated image and the text description.Secondly,in view of the high complexity of the structure of the text-to-image generation task model based on the generative adversarial network,the calculation amount is too large,the training time is long,and the convergence is difficult,based on the idea of single-stage generative adversarial network,this paper builds a stacking and fusion generative adversarial network(Deep Stacking and Fusion Generative Adversarial Network,DSF-GAN).The network only contains a pair of generators and discriminators,so the model complexity,calculation amount and training time have been greatly reduced.In order to solve the problem of low resolution and poor visual sensory quality of images generated by the single-stage generative adversarial network,this paper adds a deep stacking and fusion block to the model to achieve the deep fusion of text information and image features,so that text information can better guide the image Generate,improve the quality of generated images.At the same time,it adopts a cascading structure of multiple deep stacking and fusion blocks,and introduces conditional text information into the generation network many times,helping the single-stage generative adversarial network to generate high-quality,high-resolution,and strong text-consistent images.In order to reduce the complexity of the model,this paper only uses a one-way discriminator network in the model,which effectively reduces the complexity of the model.At the same time,the zero-gradient center penalty mechanism of matching perception is introduced to assist the discriminator network to make correct judgments on the input image and help the generator network to converge.The spectral normalization method is introduced to stabilize the model training process,and solve the problems of model collapse and model collapse caused by the instability of single-stage generative adversarial network training. |