Font Size: a A A

Towards Compact Visual Features

Posted on:2021-01-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:1488306017955989Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the visual big data analysis and massive Multimedia content analysis have been widely studied in computer science communities.However,with the expo,the time-and space-complexity of the traditional visual big data analysis systems are facing many serious challenges.Among them,the compact visual representation learning is one of the key problems in visual big data analysis,and it is also one of the research hotspots in the fields of computational vision,machine learning,data mining and so on.To this end,this thesis focuses on the main application of largescale visual search and multimedia computing,which carries out theoretical research and technical research on compact features,especially for compact representation learning.However,due to the information loss during feature encoding,the conventional compact feature representation models often suffer from a lot of shortcomings,including the lack of ordering information of feature compact codes,the inability to accurately capture the relationships between multi-media data,and the existence of multi-modal data missing,complex features under special manifold structure,etc.Based on extensive research and investigation,this thesis mainly consider three different information retrieval tasks,such as nearest neighbour search,multi-media data retrieval,and set-based feature retrieval.Starting from different retrieval tasks,we abstracted three different scientific problems,and proposed three corresponding solutions.For the first tasks,we introduced ordinal constraint problems to compact representation learning.We decompose this problem into a series of sub-problems,such as ordinal information construction,ordinal information pruning,ordinal information preservation,and discrete optimization,and we further gradually detailed and deeply explore these sub-problems.When facing to the multi-media data retrieval,we mainly focus on developing more accurate similarity measure among mutli-modal data,and solving the problem that the modalities of data samples are not always complete.We solve the multi-modal data search from a new perspective,and make the retrieval task satisfy the needs of current multi-media big data search.At last,we mainly focus on the set-based feature analysis problem,which is based on the classical Riemmanian Network.With fully analyzing the problems of existing models,we combine the geometric network with compact representation learning,and propose a new deep Riemannian network model.The main innovations of this thesis are summarized in the following five aspects:1)We proposed an ordinal embedding hashing algorithm,which embeds given ordinal relations among data points to learn the ranking-preserving binary codes.Therefore,we implement ordinal embedding,ordinal graph embedding,landmark-based ordinal graph embedding,and then propose a simple optimization algorithm.2)We proposed an ordinal constrained hashing algorithm,which also embeds the ordinal relation among data points to preserve ranking into binary codes.The core idea is to construct an ordinal graph via tensor product and an ordinal constraint projection,both of which can approximate the n-pair ordinal graph by L-pair anchor-based ordinal graph,and reduce the corresponding time-and space-complexity.(n<<L)3)We proposed a Multi-modal Neighbor Set Hashing algorithm,upon which an novelty similarity measure with alternating optimization is introduced to learn binary codes that embeds such multi-modal similarity.Such a newly defined similarity is named as Multimodal Neighbor Set Similarity,which can explicitly present the relationship among the multi-modal instances.4)We proposed a Dense Auto-encoder Hashing algorithm,which can explicitly impute the missed modality and produce binary codes by leveraging the relatedness among different modalities.We propose Dense Auto-encoder to impute the missing modality,and then use variational hash framework to learn the encode function.5)We proposed a Neural Bag-of-Matrix-Summarization(BoMS)method to be combined with Riemannian network,which handles the above issues towards highly efficient and scalable SPD features.Our key innovation lies in the idea of summarizing data in a Riemannian geometric space instead of the vector space.To sum up,this thesis has carried out a series of research and innovation on large-scale visual search and multimedia computing tasks,and proposed a series of compact representation methods.The experimental results and theoretical analysis have shown that the five proposed methods in this thesis can improve the performance of corresponding retrieval tasks.
Keywords/Search Tags:Information Retrieval, Compact Feature, Multi-modal Data Analysis, Rie-mannian Network, Discrete Optimization
PDF Full Text Request
Related items