| Image generation refers to the process of utilizing artificial intelligence technology to generate single-modal or cross-modal images based on the given data.Image generation technology has a wide range of potential applications in fields such as virtual reality,digital art,and medical imaging,thus possessing significant research value.Text-to-image generation tasks require ensuring semantic consistency between the generated images and text,as well as generating diverse images.Currently,there are two types of solutions to the semantic consistency and image diversity problems in text-to-image generation tasks.One approach is to use attention mechanisms to fuse features between images and text,but direct feature fusion between modalities may result in information loss.Another approach is to introduce contrastive loss to improve the model’s understanding of semantics,by calculating contrastive loss between different text descriptions of the same image or between different images generated from different texts.However,there is still a lack of understanding of the semantic relationship between images and text.To address the above issues,this paper proposes a contrastive learning-based image generation algorithm,CLIG.By introducing contrastive loss as a constraint between images and text,the interaction between image and text semantics is enhanced,and images with stronger semantic consistency and richer diversity can be generated.In addition,based on CLIG,this paper designs and develops a contrastive learningbased image generation system.The main work of this paper includes the following three parts:(1)This paper proposes a contrastive learning-based algorithm for image generation.First,we employ a contrastive learning method to align the features of semantically related images and text,capturing their intrinsic associations.Utilizing the soft-target concept from knowledge distillation,we generate pseudo-targets using a momentum model to enhance the performance of contrastive learning.Moreover,we replace the traditional image encoder with a masked image encoder,encouraging the model to learn richer image features and improve performance.Finally,we replace the unidirectional multimodal decoder in previous models with a bidirectional multimodal decoder,enabling the model to attend to image information from multiple directions and generate images in parallel.(2)A contrastive learning-based image generation system is implemented.Firstly,a requirement analysis was carried out for the system.Then,the system’s overall design and database structure were analyzed and designed.Finally,this paper implements the key modules of the image generation system based on contrastive learning using a B/S architecture.The system is capable of bridging the "semantic gap" and providing personalized image customization services for users,as well as offering diverse visualization options and a good interactive experience.(3)The proposed algorithm and system are experimentally validated on the CUB and MS COCO datasets.The experimental results show that,compared to the baseline models,our algorithm achieves performance improvements on the CUB and MS COCO datasets,particularly in the RPrecision metric that represents semantic consistency,with increases of 2.26%-15.9%and 2.1%-10.2%on the two datasets,respectively.Diversity experiments and semantic understanding experiments also demonstrate that the images generated by our algorithm exhibit good diversity and semantic consistency.The contrastive learning-based image generation system also meets the anticipated requirements,satisfying user needs. |