Font Size: a A A

Generative Adversarial Network-based Text Generating Image Research

Posted on:2021-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:R PanFull Text:PDF
GTID:2517306302954179Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Text generation image,that is,given a piece of text description,generates an image that matches the semantics of the text,which is a cross-modal task that combines natural language processing and computer vision.There are two goals to accomplish this task: one is to maintain a certain degree of naturalness in the generated image,and the other is that the generated image contains text information,that is,the semantics of the image and text are consistent.In recent years,researchers have made breakthroughs and innovations in text generation image tasks,and the quality of generated images is getting higher and higher.However,due to the complexity of the task,there is still a certain gap between the generated image and the real image.Compared with conditional input such as category labels,the complexity of natural language text makes the model face great challenges in using the detailed information of input conditions.At the same time,natural language texts are highly subjective and diverse,and even with the same semantic information,the expressions of the texts are different,which makes the model have some difficulties in maintaining the semantic consistency of the generated images and texts.In addition,for text generation image tasks,most of the current research is based on English.One is that English is the world 's largest language,and authoritative papers are almost always published in English.English also pays much more attention than other languages.The second is because the text generation image task requires a large-scale text and image corresponding data set.The labeling of the data set is time-consuming and laborious,while the English-image data set is relatively complete.There has been relatively little research on generating images from Chinese,so more ideas and innovation are needed.Based on this,this article focuses on exploring the task of generating images from Chinese text,and on this basis,enhances the details of the generated images and maintains the semantic consistency of the images and text.The main research work in this paper can be summarized as follows:1.A certain scale Chinese-image data set is constructed.Due to the different language rules of Chinese and English,such as word segmentation,a processingmethod for Chinese text is adopted.2.For the task of generating images in Chinese,this paper proposes a generative adversarial network structure combining attention mechanism and text semantic alignment structure.Based on the stack generation adversarial network,the high-resolution image is gradually generated through the structure of the stack.The model's attention mechanism includes two aspects: 1.While encoding the text as a global sentence vector,the independent words in the text are also encoded as vectors.When drawing the image subregion,the attention will be focused on the words most relevant to it As far as possible,each word is correctly represented in the image;2.Deep attention multi-modal similarity model,which maps text features and image features to a common semantic space,and calculates word-level and sentence-level images-The similarity of the text,so as to provide a fine-grained image for the training of the generator.The loss of text matching solves the problem that the generated image is not accurate enough and the details are not clear enough.Add the text semantic realignment structure to the end of the model generated image,re-describe the generated image as text,calculate the similarity between the re-described text and the original conditional text,provide additional text semantic realignment loss,and solve the problem The variety of Chinese expressions leads to the problem of semantic deviation of the generated images.3.The model proposed in this paper is tested on a Chinese data set and compared with related variant models.The experiment uses IS and FID as objective evaluation indicators.The experimental results show that the model proposed by this paper generates images on these indicators.Better than other models,the generated images are more realistic,delicate,clear and diverse,and there is a smaller gap from the real images.At the same time,a subjective test is set as a supplementary evaluation.The evaluation results show that the image generated by this model is significantly better than the baseline model in the visual effect of the human eye.In addition,the model was tested for semantic capture ability.The test results show that the model can well capture the subtle semantic differences in the text,and large pixel changes occur according to the changes of the subtle words in the text.Finally,the model is compared with other models.The experimental results show that the model proposedin this paper is superior to other models in the object structure,sharpness and detail performance of the generated image.In summary,this paper proposes a generative adversarial network model for Chinese text-generated images.The model is based on a stack-generative adversarial network,which combines attention mechanism and text semantic realignment structure,while ensuring the clarity of the generated image.To keep the semantics of images and text information consistent,it provides an idea for the study of Chinese text generation images.
Keywords/Search Tags:generative adversarial network, text to image, cross-modal, attention mechanism, semantic alignment
PDF Full Text Request
Related items