Research On Image Caption Generation Methods With Dual Attention Mechanism

Posted on:2021-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:L J Li

Full Text:PDF

GTID:2428330605953435

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As a crossing domain of computer vision and natural language processing,the image caption generation has been an active research topic in recent years,which contributes to the multimodal social media translation from unstructured image data to structured text data.The conventional research works have proposed a series of image captioning methods,such as template-based,retrieval-based,encode-decode.Among these methods,the one with encode-decode framework is widely used in the image caption generation,in which the encoder extracts the image features by Convolutional Neural Network(CNN),and the decoder adopts Recurrent Neural Network(RNN)to generate the image description.The Neural image caption(NIC)model has achieved good performance in image captioning,and,however,there still remain some challenges to be addressed.To tackle the challenges of the lack of image information and the deviation from the core content of the image,our proposed model explores visual attention to deepen the understanding of the image,adopts textual attention to enhance the integrity of information,and puts forward the dual attention mechanism combined with visual attention and textual attention to guide the image caption generation.To address the problem of the generated sentences deviating from the core content of the image,based on the NIC model,the encoder utilizes the Inception＿v4 network to extract the image features,while the decoder introduces the visual attention mechanism to add to the Long Short-Term Memory(LSTM)network.To tackle the problem of the lack of image information in the generated sentence descriptions,the textual attention mechanism is proposed to enhance the information integrity of the generated sentence descriptions.This thesis tries to extract labels based on Fully Convolutional Network(FCN)and Non-negative Matrix Factorization(NNF)topic model and adopts the dual attention mechanism to guide the image caption generation combined with textual attention attached to the image labels and visual attention focusing on image regions.The effects of different positions of visual attention and text attention on the results of the image caption are also explored.The experiments have been conducted on the AIC-ICC dataset.And the result of the image caption generation of the NICNDA model based on the dual attention mechanism is better than the benchmark model and the models based on a single attention mechanism,which shows that the proposed NICNDA model based on the dual attention mechanism is feasible.Moreover,the results of the image caption generation based on the combination of the dual attention mechanism also show that the research on the combination of the dual attention mechanism is meaningful and effective in this thesis.

Keywords/Search Tags:

Image Caption Generation, Dual Attention Mechanism, Fully Convolutional Network, Topic Model

PDF Full Text Request

Related items

1	Image Chinese Caption Generation Based On Visual Attention And Topic Model
2	Research On Image Caption Generation Model Based On Attention Mechanism
3	Image Chinese Caption Generation Method Based On Attention Mechanism
4	Research On Image Caption Generation Method Based On Deep Learning
5	Research On Image Caption Based On Attention Mechanism
6	Research On Image Caption Algorithm Based On Attention Mechanism
7	Study On Multi-Topic Based Image Caption
8	Image Caption Generation With Region Based Attention Scheme
9	Image Caption Algorithm Based On Graph Convolution Networks And Attention Mechanism
10	Research On Image Caption Method Based On Attention Feedback Mechanism