| Text description-guided image generation is to use a paragraph of text description to generate images that correspond with the text content.It requires the computer to accurately understand the text description of conveying semantic information.Moreover,the semantic information must be accurately converted into image information that matches each other,which is a very challenging task.With the emergence of generation adversarial network,more and more methods of generating text description-guided images have been proposed.Although this work has made good progress,when dealing with complex scenes with multiple interactive objects,the images generated by the existing methods often have problems such as artifacts,overlapping objects,and missing objects.At the same time,there is a difference between the objective evaluation index and the subjective evaluation of the current model,which will lead to poor performance of the generated model.Therefore,to further improve the performance of the text description guided image generation model,the following works are carried out in this paper:1)A generative adversarial network model combined with scene description is proposed in order to solve the object overlapping and missing problems in generated images.First,a mask generation network is introduced to preprocess the dataset with the purpose of providing objects in the dataset with segmentation mask vectors,these vectors are used as constraints to train a layout prediction network by text description,so that the specific location and size of each object in the scene layout can be obtained.Then the results are sent to the cascaded refinement network model to complete the image generation.Secondly,the scene layout and images are introduced into the introduced layout discriminator together to bridge the gap between them and get a more real scene layout.The experimental results showed that the proposed model can generate more natural images that matched the text description better,and effectively improving the authenticity and diversity of generated images.2)A reference-free image quality evaluation model combined with attention is proposed in order to solve the inaccuracy problem of evaluation indicators.First,the designed attention semantic extraction network is used to extract the semantic features of the image.Second,the attention mechanism network is designed to receive semantic features to form attention weight information and send the results to the quality prediction network to complete the prediction of image quality scores.Finally,the predicted quality score and the true quality score are standardized and then the introduced loss function is introduced to bridge the gap between the predicted quality score and the true quality score to obtain a more accurate image quality score.The experimental results show that the model designed in this paper could accurately predict the quality score of the image,and the test result on the real image dataset is better than the existing methods. |