Research On Multi-angle In Text-to-Image Generation Based On Generative Adversarial Networks

Posted on:2024-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Wang

Full Text:PDF

GTID:2568307067493074

Subject:Computer science and technology

Abstract/Summary:

PDF Full Text Request

The cross-modal task was proposed to give computers the ability to perceive more information about the world,thus increasing their understanding and knowledge of the real world,which includes the research on this topic,text-to-image generation.The task is a natural language to vision cross-modal task,giving the machine the ability to generate corresponding images based on textual descriptions.Although many decent performances have been achieved,the instability of the generative model,and the complexity of the text description semantics,make the task still challenging and many issues deserve more re-search investment.（1）The object frame is prone to deviate or collapse during the training process,making subsequent refinement impossible.（2）Non-target regions of the gener-ated images are influenced by text.（3）The background of the generated image will usually be monotonous and blurred.Based on the problems identified above,the paper proposes the following approaches.To address the problem that the target object tends to deviate during the generation process,the paper proposes a Class-Aware skeleton Consistency Generative Adversarial Network,CAC-GAN.With the help of image classification methods and metric learn-ing methods,the CAC-GAN first obtains the class-aware features from prior knowledge.They are used as additional supervision to maintain the stability of the image during the generation process.To evaluate the integrity of generated images,the paper proposes a new metric called CAC_loss.The metric measures the integrity of the results by cal-culating the class-aware feature distance between the generated distribution and the true distribution.CAC-GAN obtained good results on both the CUB and Oxford-102 datasets,verifying that the method can improve image integrity.However,the results were poor on the COCO dataset,and we also explored the limitations of the method.To address the problem of non-target objects being influenced by text in the gen-erated images,the paper proposes proposes a Multilevel-Aware Consistency Generative Adversarial Network,MAC-GAN.At the entity level,we build a text-to-image-to-label structure to enhance the alignment of text-image pairs.At the feature level,we use the CLIP pre-trained model to align the features of the text-image pairs.To better evaluate the text-image consistency,we introduce a more explanatory consistency metric,F1-score,based on the image multi-label classification method.Results on the CUB,Oxford-102and COCO datasets show that this multilevel alignment method can improve the corre-spondence between text and images and reduce the impact of text on non-target regions.Some of the current generative models generate images where the background is not realistic enough or is too monotonous and blurred.Naive BN at the batch sample level,but the background varies greatly between samples,resulting in blurring and averaging of the background.To alleviate the problem,the paper proposes a Dual Conditional Instance Normalization Generative Adversarial Network（DICN-GAN）.We use the sentence-level and the phrase-level representations of text as two conditions for image generation,and design a deep fusion convolution module based on the Instance Normalization method to build a single-stage generative adversarial network.Comprehensive experiments on two widely used datasets,CUB and Oxford-102,show that DCIN-GAN improves background quality and increases the diversity of the generated images’background.

Keywords/Search Tags:

Cross-modality, Text-to-Image Generation, Metric Learning, Text-image Consistency, Conditional Instance Normalization

PDF Full Text Request

Related items

1	Research On Semantic Consistency In Text-to-Image Generation
2	Research On Cross-modal Generation From Text To Person Image
3	Dual-channel Consistency Constraint Generative Adversarial Network For Text-guided Image Generation
4	Research And Application Of Text-to-Image Technology Based On Multi-modal Pre-training
5	Research On Generative Adversarial Network For Text-to-Image Synthesis
6	Research On Text To Image Synthesis Algorithm Based On Stacked Generative Adversarial Networks
7	Research On Text Guided Image Generation Method Based On Adversarial Learning
8	Research On Cross-modality Person Re-identification Based On Deep Learning
9	Generate Text Description From Content-Based Annotated Image
10	Text To Image Generation Based On Generative Adversarial Network