Font Size: a A A

Research On Image-Text Retrieval Algorithm Based On Semantic Reasoning

Posted on:2023-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2558307070482244Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The real world is full of multi-modal information,in which vision and language are vital tools for human perception,and image-text retrieval is a bridge connecting the two.Image-text retrieval aims to measure the matching degree of image and text features,and realize mutual retrieval between the two modalities.Its core lies in narrow the semantic gap between heterogeneous modalities.Although great progress has been made in this field,the task still faces many challenges.In this paper,the existing issues are discussed in depth,and the main research contents are as follows:Aiming at the problem of insufficient extraction of latent semantics inside the image branch by existing methods,an image-text matching algorithm based on self-attention reasoning is proposed.The self-attention module is designed to model the internal relationship of the image,considering the contribution of local semantics to the overall semantic and the semantic repetition between local semantics,assigning weights to reintegrate features to weaken the negative impact of irrelevant semantics,and obtain implicit semantics of image branches.In addition,the interactive attention module is used to model the external relationship,obtain the representations of text semantics in visual semantic space,and measure the similarity of the two modalities to achieve the final semantic alignment.Extensive experiments are conducted on Flickr30 K dataset and MSCOCO dataset to evaluate the performance of the model and verify the effectiveness of the proposed method.Aiming at the problem that the self-attention reasoning network cannot effectively capture some more specific semantic concepts,a crossretrieval algorithm based on relation graph reasoning is proposed.The action and spatial relationship between image entities are further modeled by relation graph reasoning.The model consists of node-level local relation reasoning and global relation reasoning.In local reasoning,content relationship reasoning realizes the attention update of the graph and learns the concepts of entities and attributes;topological structure relationship reasoning infers the implicit action and position relationship between entities by the number of common neighbors between nodes.Global reasoning further enhances visual features and captures higher-level semantics.Experimental verification is also carried out on two popular datasets,and the reranking strategy of image to text branch is used to further improve the performance of the model.
Keywords/Search Tags:Image-Text matching, Cross-modal retrieval, Self-attention mechanism, Relation graph reasoning
PDF Full Text Request
Related items