Font Size: a A A

Cross-modal Semantic Information Acquisition For Image Retrieval

Posted on:2014-01-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:N HeFull Text:PDF
GTID:1228330398955116Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Since image producing and sharing become easier and easier, the size of the image databases we are using become larger and larger. How to effectively and efficiently get images which we are interested, therefore, becomes an important and urgent question. Although Content-Based Image Retrieval (CBIR) has been extensively studied for more than a decade, there exist three limitations which restrict its practicability. Firstly, the precision of CBIR is usually unsatisfactory because of the semantic gap between low-level visual features and high-level semantic concepts. Secondly, the efficiency of CBIR is usually low due to the high dimensionality of visual features. Thirdly, the query form of CBIR is unnatural for image search owing to the possible absence of appropriate example images. In contrast, Text-Based Image Retrieval (TBIR) solely adopts the text information to carry through the image indexing and search. Compared with visual information, text is essentially a kind of representation for image content in the view of human-being concepts thus is low dimensional and can be processed much easier.. Therefore, TBIR is a straightforward solution to conquer the disadvantages of CBIR. But annotating large-scale image database manually is impossible. Recently, as social networks become popular, more and more users are involved to share and annotate images distributed in the web. However, such kind of user annotated tags is noisy and incomplete.In this thesis, we combine both the text and visual modality to extract semantic information from images. Our major work includes:1. We study the cross-modal semantic acquisition technology(CSIA) for image retrieval and propose a framework for Cross-modal Semantic Information Acquisition. Based on the framework, we implement cross-modal semantic acquisition. Both semantics of text and semantics of visual content are extracted and fused together. Compared with single modal semantic acquisition, our framework is more effective for image retrieval.2. We investigate the automatic image annotation problem and propose a new feature descriptor, scale space histogram of oriented gradient (SSHOG) for content based image semantic acquisition. SSHOG describe images in a multi-scale way based on the scale space theory. Since objects in real world have multi-scale properties. SSHOG is more effective than single scale features. We test our SSHOG based image semantic acquisition method on INRIA Person Dataset. Experimental results show the effectiveness of our method.3. We investigate image distance measure technology for image retrieval and propose a Lie group spatiogram similarity measure based image retrieval approach. Spatiogram is a extension of ordinary histogram. To overcome the problem that histogram can capture color information only but discard all the spatial information, spatiogram can capture not only distribution of color information but also the distribution of pixel locations using Gaussian distribution. However, Gaussian functions are not vectors but form a Lie group. So we adopt the Lie group spatiogram similarity which is based on Lie group based Gaussian space analysis for image retrieval. We test our retrieval method on Corel dataset. Experiment results indicate that our method is more effective than other spatiogram based methods.4. We address the semantic fusion problem and propose a method to extract and fuse semantics from text and visual content. On one hand, we automatically annotate images based on visual content and combined the resultant annotations with user annotations. On the other hand, we refine the annotations based on semantic consistency and content similarity. We model the semantic fusion problem as a constrained optimization problem. Constraints include semantic consistency, content consistency, error sparsity etc. We test our method on the NUS-WIDE and MIRFlickr-25K datasets. Experimental result show the effectiveness of our method.Since cross modal semantic information acquisition can avoid the disadvantages of single modal, our work is useful for image retrieval.
Keywords/Search Tags:User tagging, Object semantics, Semantic Acquisition, Cross-mode semantics, Feature descriptor
PDF Full Text Request
Related items