Research On Image Caption Generation Method Based On Attention Mechanism

Posted on:2024-09-22

Degree:Master

Type:Thesis

Country:China

Candidate:H L Zhang

Full Text:PDF

GTID:2568307091988009

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Image captioning is a cross-domain and multi-mode research field that requires algorithmic models to generate natural language descriptions that correspond to the given images in content and are semantically coherent.The task of automatic image captioning has high research significance and practical value in areas such as image-text retrieval and humanmachine interaction.In recent years,attention mechanisms have played an increasingly important role in image captioning tasks,with attention-based feature matching and efficient utilization becoming the focus of research.This dissertation focuses on the attention mechanism as the core and conducts relevant research by improving the attention mechanism and enhancing the efficiency of feature utilization as the starting point.(1)Image captioning method based on semantic attention.The fluency of image descriptions depends on the semantic relationship between contexts.To extract more semantic features to guide the generation of descriptive sentences,we propose a semantic attentionbased image captioning model.We encode text features into word vectors using the word embedding module and transform semantic relationships between context into relationships between features.The semantic attention module can extract visual and semantic features and use the sentinel mechanism to weight the two types of features.Experiments on the MS COCO and Flickr30 k datasets show that the proposed method is feasible and achieves improved evaluation metrics.(2)Image captioning method based on spatial correlation attention.The correlation between objects in an image can effectively improve the quality of image descriptions,but representing the relationship between objects using global features cannot accurately capture the correlation between objects.To accurately capture the correlation between objects and improve the accuracy of image descriptions,we propose an image caption generation model based on spatial correlation attention.This method uses object detection algorithms to extract visual features and object spatial position information from the image.After fusing visual features with spatial position information in a high-dimensional space,the spatial correlation attention feature of the image is captured.Finally,visual features and spatial correlation attention features are used as inputs to the visual attention and spatial correlation attention modules,respectively,to guide the generation of word sequences.Experiments on the MS COCO dataset validate the effectiveness of this method.

Keywords/Search Tags:

Image captioning, attention mechanisms, Semantic features, Spatial correlation features

PDF Full Text Request

Related items

1	Research On Image Captioning Algorithm Based On Deep Neural Networks
2	Research On Image Captioning Using Semantic Enhanced Features And Negative Examples Mining
3	Research On Image Captioning Methods Based On Deep Learning
4	Research On Image Captioning Algorithm Based On Encoding And Decoding
5	Research On Image Description Based On Multimodal Recurrent Network
6	Research And Application Of Image Steganography Based On GAN And Attention Mechanisms
7	Research On Key Technologies Of Image Captioning Based On Semantic Relation Enhancement
8	Research On Image Feature Extraction Method For Design Patent Image Retrieval
9	Medical Image Retrieval Based On Low Level Features And Semantic Features
10	Research On Semantic-Attentive Deep Image Captioning Method