| Recently,due to the Internet developing fast,multimedia data referred to as various types of media data such as audio clips,images and texts are omnipresent.How to get the rich social information and important economic value from these multimedia data has attracted the attention of the researchers from both academia and industry.Cross-modal retrieval approaches emerge to solve the problem.Various efforts have been taken to research cross-modal retrieval,which aims to achieve the required relevant objects from one modality when given one data object from another modality as query.The provided results can be helpful to the users to achieve useful events.There remains a fundamental problem for cross-modal retrieval approaches,i.e.,how to make the similarity between multi-modal data be directly measured,which is referred to as the heterogeneity gap.To solve the problem,there comes to two strategies.One is cross-modal subspace learning and the other is cross-modal hashing.Considering the characteristics of multimodal data as well as some problems in existing methods,this thesis conducts in-depth research and analysis on how to make full use of discriminative information and better preserve the semantically structural information.The main contributions of this thesis are:(1)In order to not only preserve the correlation among multi-modal information but also fully exploit the semantically structural information,a novel framework called discriminative subspace learning for cross-modal retrieval(DSL)is proposed to joint feature selection and semantic structure preservation into subspace learning.A shared semantic graph is constructed to preserve the semantic structure within each modality.Besides,the Hilbert-Schmidt Independence Criteria is introduced to preserve the consistence between feature-similarity and semantic-similarity of samples.Finally,an angular reconstructive scheme is constructed to the learning model to learn the feature representation of each modality.Actually,this term can compensate for the shortcomings of insufficient use of discriminative data and make the learned representation more discriminative.Thus,the retrieval performance can be improved.An iterative optimization method based on the Stiefel manifold is designed for the optimization problem.It has an excellent convergence behavior and we theoretically provide its rigorous convergence analysis.Experimental results on two widely-used datasets show that DSL achieves average improvements of 3.21% and 1.73% over the best baselines among the five compared methods on respectively.(2)There exist some challenges in cross-modal hashing methods,e.g.,how to effectively exploit the discriminative label information,how to learn more discriminative hash codes and how to avoid the high cost caused by large-scale similarity matrices.We present a fast discriminative cross-modal hashing method.Specifically speaking,when learning the hash codes,an angular reconstructive scheme is proposed to directly learn binary hash codes via using the features of text and images,which can reduce the information loss caused by the relaxation scheme.Besides,the semantic labels are utilized to guide the hash code learning directly instead of constructing an 9)× 9)pairwise similarity matrix.With the two approaches above,more discriminative hash codes can be learned.When learning the hash function,a common term is introduced to preserve valuable inter-modality information.This can learn more effective hash functions.The extensive experiments are performed to evaluate the proposed framework on two widely-used datasets.The results of the experiment demonstrated that this method achieve better performance than some competitive approaches. |