Font Size: a A A

Research On Multimedia Retrieval Based On Hashing Representation Learning

Posted on:2022-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:M J RongFull Text:PDF
GTID:2518306755472714Subject:Finance
Abstract/Summary:PDF Full Text Request
The great development of information technology has prompted the scale of multimedia resources to climb,and the global leap into the multimedia big data era backed by the Internet.The enormous amount of multimedia resources contains abundant social and economic values,which brings new opportunities for social development,and also accompanied by new challenges.Given the volume of multimedia data,how to process the data and achieve efficient multimedia retrieval has become an outstanding topic in computer science.Hashing representation learning-based approximate nearest neighbor search has captured much interest in multimedia retrieval because of its efficient retrieval efficiency and low storage consumption.Existing uni-modal hashing methods oriented to a single data type have been extensively investigated.However,for the most widespread image retrieval task,high feature dimensionality and large data scale of image data still make hashing-based image retrieval suffer from two major difficulties,i.e.”not fast” and ”not scalable”.In addition,with the emergence of information technology and the diversification of information data types,cross-modal retrieval for multimedia data has become an urgent task to be explored in multimedia retrieval.Given the rich sources and diverse types of multimedia data,there is still a lot to be explored on how to uncover the latent semantic consistency among multimodal data,bridge the distinction between heterogeneous multi-modal data,and design an effective hashing method to achieve efficient cross-modal retrieval.Motivated by the above analysis,full consideration is given to the three major problems of multimedia data: high dimensionality,large scale,and multi-modal heterogeneity.Three hashing representation learning-based multimedia retrieval methods are proposed for the three levels of increasing hash encoding efficiency for high-dimensional data,improving the scalability of hashing algorithms for large-scale data,and unifying hash encoding for heterogeneous multi-modal data to enable latent semantic preservation.(1)A Haar wavelet projection uni-modal hashing method is proposed for image retrieval.Most existing hashing methods reduce the computational effort by embedding compact hash codes,but it is difficult to achieve fast hash mapping for the original highdimensional data.To address the problem,a hashing representation learning framework based on Haar wavelet projection is proposed.First,a random projection matrix is constructed by Haar wavelet transform,and an adaptive Haar wavelet projection approach is conceived by building on the random Haar wavelet projection matrix.Then,an iterative algorithm updating rule is adopted to perform the discrete optimization for hash codes and the adaptive optimization for the projection matrix to minimize the reconstruction error.Experimental results on publicly available image datasets verify the superiority of the hash encoding efficiency of the proposed method.(2)A joint Haar wavelet projection and Nystr(?)m graph uni-modal hashing method is proposed for image retrieval.The learning process of graph-based hashing methods is computationally expensive and hardly scalable in the face of large-scale datasets.To address the problem,a hashing representation learning framework based on Nystr(?)m graph is proposed built on Haar wavelet projection.First,the fast projection matrix is constructed from Haar wavelet transform,while the Nystr(?)m method is introduced to construct Nystr(?)m graph.Then,a Nystr(?)m graph embedding scheme is devised for the hash codes of Nystr(?)m sampling points and the projected low-dimensional data.Besides,a Nystr(?)m graph regularization approach is posed to preserve the local neighborhood structure of data.Adopting Nystr(?)m graph instead of traditional Laplacian graph gives a reduction in time complexity.Experimental results on publicly available image datasets verify the higher efficiency and better retrieval performance of the proposed method.(3)A bidirectional coding and dual consistency preservation cross-modal hashing method is proposed for cross-modal retrieval.The distinctions between different modalities and the inter-modal latent semantic consistency should be both considered in dealing with heterogeneous multi-modal data.Many existing cross-modal hashing methods are pairwise similarity graph-based methods supervised by label information,which generally suffer from two problems.One is that constructing a pairwise similarity graph requires large storage space and computational overhead.The other is that most studies have focused on constraining pairwise semantic relations in the co-representation space by utilizing the pairwise similarity graph,rather than think comprehensively about the relations between different modalities and how the co-representation space is associated with the data space of each modality and semantic labels.To address the above problems,a hashing representation learning framework for leveraging bidirectional coding and dual consistency preservation is proposed.First,a bidirectional coding approach with bidirectional constraints is devised.The multimodal data matrices are projected to obtain the co-representation matrix,while multi-modal data matrices are reconstructed by the inverse process of splitting the multi-modal data matrices into the co-representation matrix and the base vector matrices by matrix factorization.Then,a dual consistency preservation strategy is introduced for bridging inter-modal differences by a inter-modal pairwise consistency preservation approach and for preserving the consistency between co-representation space and semantic labels by a label-consistent co-representation learning model.Experimental results on publicly available multi-modal datasets verify the significant advantages in retrieval performance and efficient query efficiency of the proposed method.
Keywords/Search Tags:Hashing, image retrieval, multimedia, projection, representation learning
PDF Full Text Request
Related items