Research On Image Description With Multi-feature Weighted By Visual Saliency

Posted on:2019-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:L S Liu

Full Text:PDF

GTID:2428330572951660

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Image description is a new research topic across many artificial intelligence fields such as computer vision and natural language processing.Automatically generating natural language descriptions for images can improve the semantic information of images,which is helpful to the storage and search of massive image data.In recent years,image description has gained more and more attention in the field of artificial intelligence and made great progress.However,it still faces many challenges,such as how to reasonably represent images and accurately convert image features into text features.For this purpose,we study those problems in image description.The main research contributions are summarized as follows.(1)We propose a model that generates description for images based on RNN(Recurrent Neural Network)with multi-feature to represent images.Traditional image description methods based on deep learning feature usually use the convolutional neural network trained on the Image Net with target classification as the task to extract image features,which can not contain all information in the image,such as scene information.Thus,we use different convolutional neural network models to extract the target feature and scene feature of the image respectively and concatenate them to represent the image.Then,we train a mapping matrix to map multi-feature of image and text feature into the same embedding space to align them both in terms of dimensions and semantics.Finally,we train a LSTM(Long Short Term Memory)model to translate image feature in the embedding space to text word by word.The experimental results on MSCOCO show that our model makes effective progress compared to the related methods.(2)We present a model that generates description for images with multi-feature weighted by object attention to represent images.Salient area detection is usually regarded as the eye-fixations on images,which are often object-related areas that we noticed at first glance when describing an image.So the saliency weighting can be served as a guide for generating the caption.Thus,we first extract the saliency map of image,and weighted the image with according saliency map to highlight the feature of object region in image.Then,we extract the object feature of image with saliency weighted image as input,and extract the scene feature with original image as input.Combining two features,we can obtain the multi-feature weighted by object attention.The experimental results on MSCOCO show that the descriptions made by our model are more accurate and richer.Compared with related works,our model has achieved good results on BLEU,METEOR and other public metrics.

Keywords/Search Tags:

Salient object detection, Multi-feature, Embedding space, Feature translation

PDF Full Text Request

Related items

1	Research On Salient Object Detection Algorithm Based On Multi-layer Feature Fusion
2	Research On Image Salient Object Detection Based On Multi-Feature Fusion
3	Research On Salient Object Detection Method Based On Multi-Feature Fusion
4	Research On Salient Object Detection Algorithm Of Multi-source Images
5	Research On Salient Object Detection Algorithm Based On Multi-feature Aggregation
6	Salient Object Detection Via Multi-path Cascaded Deep Neural Networks
7	Image Salient Object Detection Based On Multi-Level Feature Refinement
8	Research On Visual Feature Space Perception Algorithms Based On Deep Learning
9	RGB-T Salient Object Detection Based On Feature Enhancement
10	Salient Object Detection Based On Multi-level Contextal Information Extraction