Font Size: a A A

Image Captioning Based On Adaptive Visual Attention Mechanism

Posted on:2020-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:X GongFull Text:PDF
GTID:2428330578960245Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Image captioning is an interdisciplinary research which connects computer vision and natural language processing.It has become a hot issue in current research.The task is to generate a description statement that summarizes the main content of the image,mainly including the objects,attributes,scenes,and relationships between them.In view of the good results of the encoder-decoder framework in the research work of machine translation,researchers applied it to the image description and made significant progress.In this framework,how to effectively interpret visual representation and learning language model tasks has become the key to image description algorithms.In the existing research,on the one hand,these two tasks are often carried out simultaneously,on the other hand they also ignore the importance of image saliency.This paper combines Long Short-term Memory Network(LSTM)and visual attention mechanisms to study efficient image algorithms.The main contributions are as follows:(1)A dual LSTMs image captioning algorithm based on adaptive visual attention mechanism is proposed.In the encoder-decoder framework,the algorithm uses two LSTMs instead of a single LSTM to form a decoder,which has two sub-modules: a visual attention module and a language generation module.Visual attention module is the first LSTM,which is used to process visual representation to obtain finer-grained visual information and visual sentinels.Language generation module is the second LSTM,whose input is the output of visual attention module,which is used to generate descriptive statements.The effectiveness of the algorithm is verified by comparison with the existing classical image description algorithm.(2)Based on the existing work in this paper,considering the role of image saliency,an image description algorithm that fuses salient prior information is proposed.The algorithm uses the salient map of the original target image as weak supervised information to automatically generate a description statement.Without neglecting the non-significant visual information,this algorithm fuses the original target image and saliency map to obtain salient prior map reflecting the visual region corresponding to the subject in the description statement,so as to enhance the model's attention to salient target region.
Keywords/Search Tags:Image captioning, Attention mechanism, Encoder-decoder, Long Shor-Term Memory, Saliency map
PDF Full Text Request
Related items