Cross-modal Semantic Information Acquisition For Image Retrieval

Posted on:2014-01-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:N He

Full Text:PDF

GTID:1228330398955116

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Since image producing and sharing become easier and easier, the size of the image databases we are using become larger and larger. How to effectively and efficiently get images which we are interested, therefore, becomes an important and urgent question. Although Content-Based Image Retrieval (CBIR) has been extensively studied for more than a decade, there exist three limitations which restrict its practicability. Firstly, the precision of CBIR is usually unsatisfactory because of the semantic gap between low-level visual features and high-level semantic concepts. Secondly, the efficiency of CBIR is usually low due to the high dimensionality of visual features. Thirdly, the query form of CBIR is unnatural for image search owing to the possible absence of appropriate example images. In contrast, Text-Based Image Retrieval (TBIR) solely adopts the text information to carry through the image indexing and search. Compared with visual information, text is essentially a kind of representation for image content in the view of human-being concepts thus is low dimensional and can be processed much easier.. Therefore, TBIR is a straightforward solution to conquer the disadvantages of CBIR. But annotating large-scale image database manually is impossible. Recently, as social networks become popular, more and more users are involved to share and annotate images distributed in the web. However, such kind of user annotated tags is noisy and incomplete.In this thesis, we combine both the text and visual modality to extract semantic information from images. Our major work includes:1. We study the cross-modal semantic acquisition technology(CSIA) for image retrieval and propose a framework for Cross-modal Semantic Information Acquisition. Based on the framework, we implement cross-modal semantic acquisition. Both semantics of text and semantics of visual content are extracted and fused together. Compared with single modal semantic acquisition, our framework is more effective for image retrieval.2. We investigate the automatic image annotation problem and propose a new feature descriptor, scale space histogram of oriented gradient (SSHOG) for content based image semantic acquisition. SSHOG describe images in a multi-scale way based on the scale space theory. Since objects in real world have multi-scale properties. SSHOG is more effective than single scale features. We test our SSHOG based image semantic acquisition method on INRIA Person Dataset. Experimental results show the effectiveness of our method.3. We investigate image distance measure technology for image retrieval and propose a Lie group spatiogram similarity measure based image retrieval approach. Spatiogram is a extension of ordinary histogram. To overcome the problem that histogram can capture color information only but discard all the spatial information, spatiogram can capture not only distribution of color information but also the distribution of pixel locations using Gaussian distribution. However, Gaussian functions are not vectors but form a Lie group. So we adopt the Lie group spatiogram similarity which is based on Lie group based Gaussian space analysis for image retrieval. We test our retrieval method on Corel dataset. Experiment results indicate that our method is more effective than other spatiogram based methods.4. We address the semantic fusion problem and propose a method to extract and fuse semantics from text and visual content. On one hand, we automatically annotate images based on visual content and combined the resultant annotations with user annotations. On the other hand, we refine the annotations based on semantic consistency and content similarity. We model the semantic fusion problem as a constrained optimization problem. Constraints include semantic consistency, content consistency, error sparsity etc. We test our method on the NUS-WIDE and MIRFlickr-25K datasets. Experimental result show the effectiveness of our method.Since cross modal semantic information acquisition can avoid the disadvantages of single modal, our work is useful for image retrieval.

Keywords/Search Tags:

User tagging, Object semantics, Semantic Acquisition, Cross-mode semantics, Feature descriptor

PDF Full Text Request

Related items

1	Object Semantics-based Image Retrieval
2	Collaborating General And Specific Semantics For Multi-feature Based Image Captioning
3	Color Constancy Calculation Based On Image High-Level Semantics
4	Research On Semantics Of UML Models Based On Denotational Semantics And RCOS
5	Research Of Fusing Object Semantics And Appearance Deep Features For Scene Recognition
6	Object Cosegmentation Based On High-Level Image Semantics
7	Research On Object And Emotional Semantic Annotation Of Image
8	Research And Application On Form Design Of Service Robots Based On Product Semantics
9	Research On Collaborative Filtering Recommendation Algorithm Integrating Ontology Semantics And User Attributes
10	Unstructured Information Search Based On Ontology Semantics And Object Feature