Research On Image-Text Retrieval Based On Multi-Branch Self-Attention Coding

Posted on:2024-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:M J Zhang

Full Text:PDF

GTID:2568307151467144

Subject:Communication Engineering (including broadband network, mobile communication, etc.) (Professional Degree)

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology and the large-scale growth of various data,it is difficult for people to efficiently and accurately retrieve the information they need.In order to retrieve useful information from such diverse and complex data,cross modal retrieval has become a research hotspot in recent years.However,there is a heterogeneity gap between the underlying representations of multimodal data,which makes it impossible to directly measure the similarity between multimodal data.In addition,the volume of multimodal data is huge,and there are semantic differences between different modes.Therefore,mining immutable information among multimodal data and learning underlying features have become a difficulty in cross modal retrieval.Based on the above issues,this paper studies a cross modal retrieval model,with the specific content as follows:Firstly,in order to better learn the content similarity between multimodal data,a image and text retrieval network based on multi-branch self-attention coding is proposed.The structure approximation of the bidirectional encoder representations from transformers(BERT)model and vision transformer(ViT)model are used to extract text and image features respectively,making it easier to measure the similarity of the extracted features,making it easier for images and texts of the same class to approach,and for images and texts of different classes to stay as far away as possible.Secondly,in order to bridge the heterogeneity gap between modes,and fully retain the semantic information between modes,a dual confrontation image text retrieval network integrating self-attention mechanism is proposed.A generator is used to learn shared features,reconstruct shared representations,and generate pseudo features corresponding to the mode.Then,a discriminator is used to distinguish the authenticity of the reconstructed features and the original features.This generation confrontation mechanism makes the reconstructed feature more and more close to the original feature,so as to better retain the Semantic information of the shared feature.Thirdly,in order to learn discriminant features from multi tags with rich Semantic information,a multi tag image text retrieval method integrating attention mechanism is proposed.This model is based on Graph Attention Networks(GAT)to capture label related dependencies in multiple labels.Through the mapping function of GAT,interdependent classifiers are learned from input word embeddings,and then label classifiers are used to classify the generated common representations.Multi label semantic similarity is used to more accurately describe semantic correlations between and within modalities.

Keywords/Search Tags:

image-text retrieval, BERT, ViT, self-attention mechanisms, generative adversarial, multi-label, GAT

PDF Full Text Request

Related items

1	Research On Text-Guided Image Generation Based On Generative Adversarial Networks
2	Research On Image-text Retrieval Based On Attention Mechanism And Adversarial Learning
3	Multi-label Text Classification Based On BERT And Label Attention Mechanism
4	Research On Multi-stage Text-to-image Synthesis Method Based On Generative Adversarial Network
5	Research And Application Of Image Steganography Based On GAN And Attention Mechanisms
6	Research On Text Description Image Generation Based On Generative Adversarial Network
7	Research On Text To Image Generation Algorithm Based On Attention Mechanism And Generative Adversarial Networks
8	Research And Application Of Text To Image Algorithm Based On Generative Adversarial Networks
9	Generative Adversarial Network For Text-to-Image Synthesis
10	Research On Text-to-Image Generation Based On Generative Adversarial Network