Image Captioning Based On Generative Adversarial Network With Temporal Attention

Posted on:2022-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:H L Duan

Full Text:PDF

GTID:2558307109469524

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Image captioning is a technology in which computer models automatically generate natural language captions for given pictures.In recent years,image captioning technology has been widely used in fields such as image review and smart medicine,and has attracted more and more attention from scholars in related fields.In particular,the introduction of reinforcement learning and attention mechanism has strongly promoted the development of image captioning technology.Pictures have rich visual representations,meanwhile,effective and rich feature information plays a vital role in image captioning tasks.However,most of the related methods do not make good use of the feature information of the picture.At the same time,there are also accumulated errors in the process of sentence generation.In view of the above problems,the main research work of this thesis is as follows:In order to utilize the rich information contained in image features to a greater extent,this thesis constructs a multi-attention mechanism,which can achieve more effective feature representation and reasoning in image captioning by using local and global features.On the basis of this mechanism,a multi-attention generation adversarial image captioning network(MAGAN)is further built,which includes a multi-attention generator and a multi-attention discriminator.The designed generator is used to generate more accurate sentences,and the discriminator is used to determine whether the generated sentences are described by humans or generated by machines.This method has obtained highly competitive results on the official data set.So as to strengthen the correlation between the attention results and the hidden state at different moments,and solve the problem of cumulative errors in the word generation process to a certain extent,This thesis proposes a temporal attention network(TAN)based on the commonly used encoder-decoder framework by extending the traditional attention mechanism.TAN first pays attention to the hidden state and feature vector at the current moment,and introduces the attention results of two adjacent LSTM segments into the network loop at the next moment through the "attention fusion slot"(AFS)to enhance the attention result correlation with hidden state.In addition,this thesis designs a "hidden state switch"(HSS)to guide the generation of words,and combining it with AFS can solve the problem of accumulated errors to a certain extent.A large number of experiments on the official data set Microsoft COCO show that the model proposed in this thesis has obvious advantages compared with the baseline model.For the purpose of solving the accumulated error problem to a certain extent,and at the same time make better use of the rich image feature information to improve sentence diversity,this thesis first introduces a temporal attention mechanism,and then proposes a generational adversarial image captioning network based on temporal attention.Quantitative and qualitative experiments verify that the generative adversarial network with temporal attention can weaken the impact of cumulative errors in the word generation process,and improve the diversity of generated descriptions to a certain extent.Experiments show that the method has certain research value and research prospects.

Keywords/Search Tags:

Image captioning, Attention mechanism, Generative adversarial network, Encoder-Decoder, Reinforcement learning

PDF Full Text Request

Related items

1	Research On Image Captioning Algorithms Based On Deep Learning
2	Image Caption Generation Based On Generative Adversarial Networks
3	Research On Image Captioning Methods Based On Deep Learning
4	Research On Video Captioning Based On Deliberation Mechanism
5	Image Captioning Based On Generative Adversarial Network
6	Research And Application Of Image Captioning Algorithm Based On Generative Adversarial Network
7	Image Captioning Based On Adaptive Visual Attention Mechanism
8	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion
9	Research On Image Captioning Based On Self-Attention And Encoder-Decoder
10	Research On Image Captioning Algorithm Based On Attention Mechanism