Design And Realization Of Graphic Generation System Based On Contrastive Learning

Posted on:2024-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:D X Wang

Full Text:PDF

GTID:2568306941990729

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Image generation refers to the process of utilizing artificial intelligence technology to generate single-modal or cross-modal images based on the given data.Image generation technology has a wide range of potential applications in fields such as virtual reality,digital art,and medical imaging,thus possessing significant research value.Text-to-image generation tasks require ensuring semantic consistency between the generated images and text,as well as generating diverse images.Currently,there are two types of solutions to the semantic consistency and image diversity problems in text-to-image generation tasks.One approach is to use attention mechanisms to fuse features between images and text,but direct feature fusion between modalities may result in information loss.Another approach is to introduce contrastive loss to improve the model’s understanding of semantics,by calculating contrastive loss between different text descriptions of the same image or between different images generated from different texts.However,there is still a lack of understanding of the semantic relationship between images and text.To address the above issues,this paper proposes a contrastive learning-based image generation algorithm,CLIG.By introducing contrastive loss as a constraint between images and text,the interaction between image and text semantics is enhanced,and images with stronger semantic consistency and richer diversity can be generated.In addition,based on CLIG,this paper designs and develops a contrastive learningbased image generation system.The main work of this paper includes the following three parts:(1)This paper proposes a contrastive learning-based algorithm for image generation.First,we employ a contrastive learning method to align the features of semantically related images and text,capturing their intrinsic associations.Utilizing the soft-target concept from knowledge distillation,we generate pseudo-targets using a momentum model to enhance the performance of contrastive learning.Moreover,we replace the traditional image encoder with a masked image encoder,encouraging the model to learn richer image features and improve performance.Finally,we replace the unidirectional multimodal decoder in previous models with a bidirectional multimodal decoder,enabling the model to attend to image information from multiple directions and generate images in parallel.(2)A contrastive learning-based image generation system is implemented.Firstly,a requirement analysis was carried out for the system.Then,the system’s overall design and database structure were analyzed and designed.Finally,this paper implements the key modules of the image generation system based on contrastive learning using a B/S architecture.The system is capable of bridging the "semantic gap" and providing personalized image customization services for users,as well as offering diverse visualization options and a good interactive experience.(3)The proposed algorithm and system are experimentally validated on the CUB and MS COCO datasets.The experimental results show that,compared to the baseline models,our algorithm achieves performance improvements on the CUB and MS COCO datasets,particularly in the RPrecision metric that represents semantic consistency,with increases of 2.26%-15.9%and 2.1%-10.2%on the two datasets,respectively.Diversity experiments and semantic understanding experiments also demonstrate that the images generated by our algorithm exhibit good diversity and semantic consistency.The contrastive learning-based image generation system also meets the anticipated requirements,satisfying user needs.

Keywords/Search Tags:

Contrastive Learning, Semantic Interaction, Masked Image Encoder, Bidirectional Multimodal Decoder

PDF Full Text Request

Related items

1	Research On Key Techniques Of Multimodal Image Caption
2	Image Semantic Segmentation Based On Deep Convolutional Encoder-decoder Networks And Adversarial Learning
3	Research On Image Manipulation Detection Based On Semantic Segmentation Network
4	Image Semantic Segmentation Based On Encoder-decoder Network And Its Applications
5	Research And Application Of Image Semantic Segmentation Based On Encoder-decoder
6	Research On Image Semantic Segmentation Based On Encoder-decoder Structure
7	Research Of Image Caption Based On Encoder-Decoder
8	Research On Image Semantic Caption Generation Based On Encoder-Decoder Framework
9	Research On Image Dehazing Method Based On Deep Learning
10	Research On Image Semantic Segmentation Based On Neural Network