Font Size: a A A

Research On Binary Hashing Method For Cross-modal Information Retrieva

Posted on:2022-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:X ShenFull Text:PDF
GTID:2568307070952509Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,multimedia data has grown rapidly,and its data forms have become increasingly diversified.Therefore,the requirements for the speed and accuracy of cross-modal information retrieval are getting higher and higher.Due to the low storage consumption and high search efficiency of the hash method,hash-based cross-modal retrieval technology has received widespread attention.It encodes multimedia data into a common binary hash space,thereby effectively measuring the correlation between samples from different modalities.In addition,due to the increasingly powerful feature extraction capabilities of deep neural networks,deep cross-modal hashing methods have become more influential in the field of multi-media retrieval.However,existing related methods often ignore the potential relationship between heterogeneous data when learning common semantic subspace,and cannot retain more important semantic information when mining deep correlations.Besides,the acquisition cost of large-scale labeled data is very high,especially for multi-modal data,which limits the use of such supervised algorithms.This paper studies unsupervised and semisupervised cross-modal retrieval methods,combining the advantages of hashing methods and deep learning,and avoiding the cost of labeling data sets.Based on the above problems,the main research content and results of this paper are as follows:In this paper,an attention mechanism focusing on related features is used to propose an attention perception semantic fusion matrix,which integrates important information from different modalities.This paper introduces a novel network that can effectively encode rich and relevant features by passing the extracted features through the attention module(AGSH),and can also generate hash codes under the self-supervision of the proposed attention-aware semantic fusion matrix.Our experimental results and detailed analysis prove that compared with recent unsupervised cross-modal hashing methods,this method can achieve better retrieval performance on three popular data sets.This paper proposes a novel end-to-end deep cross-modal retrieval framework,clusterdriven deep adversarial hashing(CDAH).First,this model learns to discriminate clusters recursively through a soft clustering model.It tries to generate a modal invariant representation in the public space by obfuscating the modal classifier,and the modal classifier tries to distinguish different modalities based on the generated representation.Secondly,in order to minimize the modal gap between feature representations of different modalities with the same semantic label and maximize the distance between images and texts of different labels,this model constructs a fusion semantic matrix to integrate different modalities.The original domain information is used as self-supervised information to refine the binary codes.Finally,this model cleverly uses the scaled tanh function to adaptively learn binary codes which will gradually converge to the original difficult binary coding problem.We conducted comprehensive experiments on four popular data sets,and the experimental results proved the superiority of this model over the most advanced methods.This paper proposes a novel multi-view graph embedding cross-modal hash(MGCH)framework,which uses the output of the processed multi-view graph to guide the generation of hash codes through the graph inference module in a semi-supervised manner.Unlike traditional graph hashing methods,MGCH uses multi-view graphs as the only learning aid to connect labeled and unlabeled data in the pursuit of binary embedding.Multi-view graphs are helpful for filtering multi-directional data features from multiple anchor sets and realizing refined features.As the core component of MGCH,we use an intuitive graph-based reasoning network,which includes two graph convolution layers and a graph attention layer,and at the same time convolves the anchor graph and asymmetric graph with the input data.Comprehensive crossmode hashing evaluation on three popular data sets shows that MGCH outperforms the most advanced methods when given limited labeled data.
Keywords/Search Tags:cross-modal retrieval, unsupervised learning, semi-supervised learning, binarization, hashing, attention module, graph convolutional network
PDF Full Text Request
Related items