| With the rapid development of information technology,multi-media data(images,text and video,etc.)on the Internet has exploded.The increase of data volume and the multiple forms of multi-media data existence makes the cross-modal retrieval task face more and more challenges.Because of the low storage consumption and efficient retrieval,hash learning techniques have received more and more attention from researchers and scholars,and have been introduced to cross-modal retrieval tasks.Most existing cross-modal hashing methods compare the similarity between heterogeneous data by casting features of different dimensions into a common Hamming space and then calculate the Hamming distance between hash codes through a dissimilarity operation between hash codes.To address some shortcomings of existing cross-modal hashing methods based on deep learning,such as(1)the heterogeneity between modalities is not well eliminated;(2)the structural and semantic information of multi-modalities is not effectively modeled;(3)modality-specific(complementary)and modality-sharing(correlation)information is not fully explored and utilized,two supervised cross-modal hashing methods and one semi-supervised cross-modal hashing method are proposed in this dissertation,summarized as follows:(1)In order to reduce the heterogeneity between modalities while maintaining the neighborhood of the original data,this dissertation proposes a Graph-guided and Inter-modal Feature Fusion for Cross-modal Hashing(GIFH).This method uses the inter-modal feature fusion module to cross-fuse the features of image modality and text modality to reduce the inter-modal heterogeneity.In addition,the twin graph convolutional neural network module uses the powerful graph representation capability of graph convolutional neural network(GCN)to guide the hash codes to better preserve the neighbor information,thus generating more discriminative hash codes.(2)In order to effectively model multi-modal structures and semantic associations while making full use of intra-modal information and reducing modal differences,this dissertation proposes a Modality-fused Graph Network for Cross-modal Retrieval(MFGN).This method uses a graph convolutional neural network in the modality-fused channel to learn modality-sharing representations for guiding the image and text channels to learn discriminative hash codes.In addition,a feature integration module is introduced in the image and text channels to reduce the loss of details during the learning process.(3)In order to fully mine and jointly utilize modality-specific(complementary)and modalityshared(relevant)information for retrieval,this dissertation proposes a Semi-supervised Cross-modal Hashing Via Modality-specific and Cross-modal Graph Convolutional Networks(MCGCN).This method contains two modality-specific channels and one cross-modal channel for learning the modality-specific and shared representations of each modality,respectively.The Graph Convolutional Neural Network is used in these three channels to explore intra and inter-modal similarities and to perform semantic information propagation from labeled to unlabeled data.The modality-specific and shared representations of each modality are fused through an attention scheme.To further reduce modality differences,a discriminative model is designed to learn to classify the modal representations and guide the network training through adversarial learning.The above methods are experimentally evaluated and analyzed on Wikipedia,MIRFlickr-25 K,and NUS-WIDE datasets commonly used in cross-modal retrieval tasks,and the above methods are compared with other excellent cross-modal hashing methods.In these experimental results,it is shown that the proposed methods in this dissertation are all effective. |