Font Size: a A A

Research On Image Description With Multi-feature Weighted By Visual Saliency

Posted on:2019-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:L S LiuFull Text:PDF
GTID:2428330572951660Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Image description is a new research topic across many artificial intelligence fields such as computer vision and natural language processing.Automatically generating natural language descriptions for images can improve the semantic information of images,which is helpful to the storage and search of massive image data.In recent years,image description has gained more and more attention in the field of artificial intelligence and made great progress.However,it still faces many challenges,such as how to reasonably represent images and accurately convert image features into text features.For this purpose,we study those problems in image description.The main research contributions are summarized as follows.(1)We propose a model that generates description for images based on RNN(Recurrent Neural Network)with multi-feature to represent images.Traditional image description methods based on deep learning feature usually use the convolutional neural network trained on the Image Net with target classification as the task to extract image features,which can not contain all information in the image,such as scene information.Thus,we use different convolutional neural network models to extract the target feature and scene feature of the image respectively and concatenate them to represent the image.Then,we train a mapping matrix to map multi-feature of image and text feature into the same embedding space to align them both in terms of dimensions and semantics.Finally,we train a LSTM(Long Short Term Memory)model to translate image feature in the embedding space to text word by word.The experimental results on MSCOCO show that our model makes effective progress compared to the related methods.(2)We present a model that generates description for images with multi-feature weighted by object attention to represent images.Salient area detection is usually regarded as the eye-fixations on images,which are often object-related areas that we noticed at first glance when describing an image.So the saliency weighting can be served as a guide for generating the caption.Thus,we first extract the saliency map of image,and weighted the image with according saliency map to highlight the feature of object region in image.Then,we extract the object feature of image with saliency weighted image as input,and extract the scene feature with original image as input.Combining two features,we can obtain the multi-feature weighted by object attention.The experimental results on MSCOCO show that the descriptions made by our model are more accurate and richer.Compared with related works,our model has achieved good results on BLEU,METEOR and other public metrics.
Keywords/Search Tags:Salient object detection, Multi-feature, Embedding space, Feature translation
PDF Full Text Request
Related items