Font Size: a A A

Research On Multi-modal Hashing Methods For Efficient Multimedia Retrieval

Posted on:2022-04-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Q ZhengFull Text:PDF
GTID:1488306602478304Subject:Network and network resource management
Abstract/Summary:PDF Full Text Request
With the rapid growth of mobile devices and social networks,people's demand for information retrieval is no longer limited to keyword search,but the search between images and images,images and texts.With the development of multimedia technology,the multimedia retrieval problem becomes more and more important.How to efficiently and quickly carry out large-scale multimedia retrieval has become a concern.Multi-modal hashing can encode multi-modal features into compact binary hash codes.It has attracted more and more attention due to its fast retrieval speed,low storage cost,and effective support for large-scale multimedia retrieval.Although several multi-modal hashing methods have been proposed,multi-modal hashing methods have not been fully explored.The main difficulties in multi-modal hashing are:(1)In the large-scale multimedia retrieval,due to the inefficient hyper-parameter adjustment process and the inefficient hash optimization process,existing methods in large-scale multimedia retrieval still suffers from the problem of low computation efficiency.It is difficult to extend to large-scale multimedia retrieval.(2)Unsupervised multi-modal hashing methods cannot effectively capture the semantic correlations of multi-modal data without any labels for supervision.Although supervised multi-modal hashing method can use the discriminant semantic tags to achieve higher retrieval performance,it still has the problem of high cost to acquire labeled data.Semi-supervised multi-modal hashing method treats labeled samples and unlabeled samples equally,ignoring the problem that labeled and unlabeled samples involve different levels of semantic information respectively,resulting in suboptimal results.(3)The phenomenon of modal missing is very common in real-world social networks,which leads to the generation of partial-modal data.Most existing multi-modal hashing methods require that the multi-modal data should be fully paired at the training and query stage,so the existing multi-modal hashing methods cannot solve this problem well when the modal is missing.Aiming at the problems of existing multi-modal hashing methods,this dissertation discusses and experiments how to efficiently carry out multimedia retrieval,and puts forward the research topic of multi-modal hashing methods for efficient multimedia retrieval.The specific research objectives are summarized as follows:(1)To solve the problem of low computational efficiency of unsupervised multi-modal hashing method at the training stage,this dissertation proposes an unsupervised efficient parameter-free adaptive multi-modal hashing method.Specifically,it can adaptively capture the modality variations and preserve the discriminative semantics of multi-modal features into the binary hash codes.Besides,it designs an iterative optimization strategy to learn binary hash codes,which is simple and efficient to learn hash codes.It is worth noting that the proposed model is very simple,but the performance and efficiency are very high.Experiments on public multimedia retrieval datasets show that this method has high retrieval accuracy and efficiency.(2)To solve the problems of the traditional supervised multi-modal hashing methods,such as limited semantics,high computational complexity and large error of relaxation optimization,this dissertation proposes a supervised fast discrete collaborative multi-modal hashing method.It proposes an efficient collaborative multi-modal mapping that first transforms heterogeneous multi-modal features into unified factors to exploit the complementarity of multi-modal features and preserve the semantic correlations in multiple modalities with linear computation and space complexity.Further,it develops an asymmetric hashing learning module to simultaneously correlate the learned hash codes with low-level data distribution and high-level semantics.In particular,this design could avoid the challenging symmetric semantic matrix factorization and O(n~2)memory cost(n is the number of training samples).(3)To solve the problem that unsupervised multi-modal hashing method does not make full use of explicit semantic labels,and supervised multi-modal hashing method is time-consuming in the process of obtaining high quality label data,this dissertation proposes an efficient semi-supervised multi-modal hashing with an importance differentiation regression method.It develops an efficient semi-supervised multi-modal hash code learning module.It learns the hash codes for labeled data in an efficient asymmetric way and simultaneously performs nonlinear regression using the same projection matrix as the labeled samples to preserve the intrinsic data structure of unlabeled data.Besides,it proposes an importance differentiation regression strategy to learn hash functions by especially considering the different importance of hash codes learned from the labeled and unlabeled samples.Finally,it develops an efficient discrete optimization method guaranteed with convergence to iteratively solve the hash optimization problem.(4)This dissertation proposes an unsupervised adaptive partial multi-modal hashing method because the missing social images and descriptive tags is common in social networks.Specifically,the shared and specific latent representations of fully paired and partial-modal images are learned separately by an adaptive partial multi-modal matrix factorization module within the identical semantic space.In particular,instead of adopting simply fixed modal combination weights,it develops a parameter-free weight learning scheme to adaptively learn the weights to capture the modal variations and the discriminative capabilities of different modalities.With such a design,this model can sufficiently exploit the available partial-modal samples with separate hash code learning and effectively preserve the latent relations of images and tags in hash codes with semantic space sharing.
Keywords/Search Tags:Multi-modal hashing, Multimedia retrieval, Hashing, Hash codes
PDF Full Text Request
Related items