Research On Image-Text Retrieval Algorithm Based On Semantic Reasoning

Posted on:2023-02-10

Degree:Master

Type:Thesis

Country:China

Candidate:Z Li

Full Text:PDF

GTID:2558307070482244

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

The real world is full of multi-modal information,in which vision and language are vital tools for human perception,and image-text retrieval is a bridge connecting the two.Image-text retrieval aims to measure the matching degree of image and text features,and realize mutual retrieval between the two modalities.Its core lies in narrow the semantic gap between heterogeneous modalities.Although great progress has been made in this field,the task still faces many challenges.In this paper,the existing issues are discussed in depth,and the main research contents are as follows:Aiming at the problem of insufficient extraction of latent semantics inside the image branch by existing methods,an image-text matching algorithm based on self-attention reasoning is proposed.The self-attention module is designed to model the internal relationship of the image,considering the contribution of local semantics to the overall semantic and the semantic repetition between local semantics,assigning weights to reintegrate features to weaken the negative impact of irrelevant semantics,and obtain implicit semantics of image branches.In addition,the interactive attention module is used to model the external relationship,obtain the representations of text semantics in visual semantic space,and measure the similarity of the two modalities to achieve the final semantic alignment.Extensive experiments are conducted on Flickr30 K dataset and MSCOCO dataset to evaluate the performance of the model and verify the effectiveness of the proposed method.Aiming at the problem that the self-attention reasoning network cannot effectively capture some more specific semantic concepts,a crossretrieval algorithm based on relation graph reasoning is proposed.The action and spatial relationship between image entities are further modeled by relation graph reasoning.The model consists of node-level local relation reasoning and global relation reasoning.In local reasoning,content relationship reasoning realizes the attention update of the graph and learns the concepts of entities and attributes;topological structure relationship reasoning infers the implicit action and position relationship between entities by the number of common neighbors between nodes.Global reasoning further enhances visual features and captures higher-level semantics.Experimental verification is also carried out on two popular datasets,and the reranking strategy of image to text branch is used to further improve the performance of the model.

Keywords/Search Tags:

Image-Text matching, Cross-modal retrieval, Self-attention mechanism, Relation graph reasoning

PDF Full Text Request

Related items

1	Research On Image-Text Cross-Modal Matching Based On Attention Mechanism
2	Research And Application On Cross-Modal Retrieval Methods For Image-Text
3	Research On Cross-Modal Image-Text Retrieval Techniques Based On Semantics And Common Sense
4	Image-text Translation Based On Cross-modal Related Semantics And Attention Mechanism
5	Research On Text-Image Cross Modal Retrieval Method
6	Attention Mechanism Based Cross-Modal Semantic Alignment
7	Research On Image-text Cross-modal Hash Retrieval Based On Semantic Preservation And Attention Mechanism
8	Research On Content Sifting And Storage Mechanism Of Cross-modal Image And Text Data Based On Semantic Similarity
9	Research On Image And Text Retrieval Based On Attention Mechanism
10	Research And Application Of Interactive And Graph Matching Image-text Retrieval