Image Captioning Based On Adaptive Visual Attention Mechanism

Posted on:2020-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:X Gong

Full Text:PDF

GTID:2428330578960245

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Image captioning is an interdisciplinary research which connects computer vision and natural language processing.It has become a hot issue in current research.The task is to generate a description statement that summarizes the main content of the image,mainly including the objects,attributes,scenes,and relationships between them.In view of the good results of the encoder-decoder framework in the research work of machine translation,researchers applied it to the image description and made significant progress.In this framework,how to effectively interpret visual representation and learning language model tasks has become the key to image description algorithms.In the existing research,on the one hand,these two tasks are often carried out simultaneously,on the other hand they also ignore the importance of image saliency.This paper combines Long Short-term Memory Network(LSTM)and visual attention mechanisms to study efficient image algorithms.The main contributions are as follows:(1)A dual LSTMs image captioning algorithm based on adaptive visual attention mechanism is proposed.In the encoder-decoder framework,the algorithm uses two LSTMs instead of a single LSTM to form a decoder,which has two sub-modules: a visual attention module and a language generation module.Visual attention module is the first LSTM,which is used to process visual representation to obtain finer-grained visual information and visual sentinels.Language generation module is the second LSTM,whose input is the output of visual attention module,which is used to generate descriptive statements.The effectiveness of the algorithm is verified by comparison with the existing classical image description algorithm.(2)Based on the existing work in this paper,considering the role of image saliency,an image description algorithm that fuses salient prior information is proposed.The algorithm uses the salient map of the original target image as weak supervised information to automatically generate a description statement.Without neglecting the non-significant visual information,this algorithm fuses the original target image and saliency map to obtain salient prior map reflecting the visual region corresponding to the subject in the description statement,so as to enhance the model's attention to salient target region.

Keywords/Search Tags:

Image captioning, Attention mechanism, Encoder-decoder, Long Shor-Term Memory, Saliency map

PDF Full Text Request

Related items

1	Research On Image Caption Method Based On High Level Semantic Extraction And Attention Mechanism
2	Image Captioning Based On Attention Long Short-Term Memory Network
3	Research On Image Caption Based On Attention Mechanism
4	Research On Image Captioning Algorithm Based On Deep Learning
5	Research And Implementation Of Image Captioning Technology Based On Deep Learning
6	Research On Image Captioning Algorithm Based On Deep Learning
7	Research On Image Captioning Algorithm Based On Attention Mechanism
8	Research On Semantic-Attentive Deep Image Captioning Method
9	EAD-OHMER:Research On Online Handwritten Mathematical Expression Recognition Based On Encoder-Decoder
10	Visual Data Understanding Based On Deep Encoder-Decoder Framework