Font Size: a A A

Research On Text Guided Image Generation Method Based On Adversarial Learning

Posted on:2024-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z K WangFull Text:PDF
GTID:2568307058477784Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology in recent years,deep learning becomes one of the most popular algorithms in the field of artificial intelligence.Based on the application of several deep learning frameworks,great progress has been made in natural language processing,computer vision,multimedia recognition,cross-modal research and other fields.Among them,cross-modal research has become an important research topic in the field of artificial intelligence because it involves the joint research of natural language,image,video,audio and other modes.Cross-modal generation of related image through natural language description has become one of the most active research fields because of its high practicability in such aspects as art generation and computer-aided design.In recent years,the technology of generating corresponding image through natural language description has achieved great success in terms of semantic integrity and visual authenticity of the generated image.In addition,in order to meet the other needs of users,additional constraints are added in the process of image generation to generate image with specified specification.Therefore,in view of the above two aspects,this thesis respectively proposes an image generation network designed to improve the effect of image generation and an image generation network designed to meet the given constraints:A text-to-image generation method based on dynamic word level update is proposed.This method constructs a multi-stage image generation framework by using dynamic weight assignment method,multi-pair generator and discriminator.It dynamically transmits text and image information to change the incorrect importance rating in word level information so that the image generation is biased in the direction consistent with the input semantics.In addition,a mixed zero-center gradient penalty function and a visual loss function are proposed to optimize the network.The mixed zero-center gradient penalty function allows the generator to generate highly semantically consistent image and ensure the stability of the training process.The visual loss function further improves the visual effect of the generated image by narrowing the difference between the real image and the generated image.Numerous experiments on CUB and MS-COCO public datasets show that the proposed method has superior image generation capability among advanced methods.A text-to-image generative adversarial network with style image constraints is proposed.This method introduces attention mechanism and style transfer method to construct depth image generation framework with style image constraint.Among them,the multi-group attention method is adopted to obtain the multi-scale dependency information in semantic features by mining the dependence relationship between short and long distance information and provide more comprehensive visual details for the network.The multi-scale style transfer method is introduced into the fusion of style feature.By applying weighted style feature to the normalization of content feature,this method transfers the color and texture information of style feature to content feature and completes image generation with style image constraint.Experiments on MS-COCO and Wiki Art public datasets demonstrate the effectiveness of the method.
Keywords/Search Tags:cross-model generation, feature fusion, generate adversarial network, text-to-image generation, attention mechanism
PDF Full Text Request
Related items