Research On Cross-Modal Retrieval Based On Deep Semantic Analysis

Posted on:2023-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:L W Cai

Full Text:PDF

GTID:2558307142469724

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the widespread popularity of the Internet and the rapid development of multimedia technology,Internet users generate different types of massive media data(text,image,voice,and video)in real life.These different types of data are usually called multi-modal data.Users’ demand for retrieval of these multi-modal data is increasing,and cross-modal retrieval has become one of the research hotspots of high interest in academia.Cross-modal retrieval refers to searching samples of one modality by querying other modalities,for example,retrieving images by text.Cross-modal retrieval currently faces two challenges: the "heterogeneity gap" and "semantic gap" between multi-modal data,making it challenging to measure cross-modal similarity directly.In order to solve these two problems,many cross-modal retrieval methods have been proposed in recent years.The core idea is to map the multi-modal data to a common subspace,mine the correlation in different modal data,and realize the cross-modal similarity measurement.Existing cross-modal retrieval methods are divided into two main categories:real-valued representation retrieval methods and binary-valued representation retrieval methods.Real-valued representation retrieval methods aim to map multi-modal data to a common real-valued representation subspace to achieve cross-modal semantic similarity metrics.However,the existing methods suffer from poor feature extraction ability,weak modal association,less data interaction,and weak ability to maintain modal consistency of data,which leads to a large room for improvement of cross-modal retrieval performance.To this end,this paper proposes a new real-valued representation retrieval method to fully exploit the semantic association rows within and between modalities through a dual-attention mechanism to improve the cross-modal retrieval accuracy.Unlike the real-valued representation-based retrieval method,the binary representation retrieval method has the features of lower representation dimension,lower storage cost,and faster similarity calculation,which makes the binary representation more suitable for big data application scenarios.However,the existing binary representation retrieval methods suffer from the deficiencies of difficulty in thoroughly learning the structural associations in the hash space and improving the semantic discriminability of cross-modal binary representations.A cross-modal hash retrieval method based on multi-label semantic fusion is proposed in this paper to address these problems.The main work and innovation points of this paper are as follows:(1)This paper presents a cross-modal retrieval method based on dual attention and generative confrontation learning.The method is an adversarial semantic representation model with a dual attention mechanism(i.e.,intra-modal attention and inter-modal attention).Intra-modal attention is used to focus on critical semantic features within a modality,while inter-modal attention is used to explore the semantic interactions between different modalities to more accurately represent high-level semantic relevance;the consistent cross-modal feature distributions are learned by intra-modal and inter-modal adversarial loss,effectively reducing cross-modal heterogeneity differences.The effectiveness of the method is verified by experimental comparison on a public multi-modal dataset.(2)This paper proposes a cross-modal hash method based on deep multi semantic fusion.This method uses two deep neural networks to extract cross-modal features respectively.It introduces the multi-label semantic fusion module to fuse the multi-label semantics into the cross-modal feature learning process so that the learned feature representation contains more potential label category information.Finally,a graph regularization method maintains the semantic similarity of cross-modal hash codes in Hamming space.The method’s effectiveness and superiority are verified by comparing the performance with the benchmark method on the cross-modal dataset.

Keywords/Search Tags:

cross-modal retrieval, subspace learning, deep learning, adversarial learning, cross-modal hashing

PDF Full Text Request

Related items

1	Cross-modal Retrieval Research Based On Correlation Analysis And Structure Preserving
2	Research On Deep Hashing Method And Security For Cross-Modal Retrieval
3	Research On Supervised Learning For Cross-modal Retrieval
4	Design And Implementation Of A Cross-modal Retrieval System Based On Deep Adversarial Hashing Technology
5	Research On Cross-modal Retrieval Method Based On Deep Semantic Hashing
6	Cross-modal Retrieval And Annotation Based On Hashing Learning Method
7	Design And Implementation Of Cross-modal Retrieval For Images And Texts Based On Deep Learning And Hashing Methods
8	Supervised Hierarchical Cross-modal Hashing
9	Semantic Alignment-based Robust Cross-modal Retrieval
10	Research On Cross-modal Retrieval Method Based On Deep Learning And Hashing Learning