Font Size: a A A

Research On Image Caption Generation Method Based On Attention Mechanism

Posted on:2024-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhangFull Text:PDF
GTID:2568307091988009Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image captioning is a cross-domain and multi-mode research field that requires algorithmic models to generate natural language descriptions that correspond to the given images in content and are semantically coherent.The task of automatic image captioning has high research significance and practical value in areas such as image-text retrieval and humanmachine interaction.In recent years,attention mechanisms have played an increasingly important role in image captioning tasks,with attention-based feature matching and efficient utilization becoming the focus of research.This dissertation focuses on the attention mechanism as the core and conducts relevant research by improving the attention mechanism and enhancing the efficiency of feature utilization as the starting point.(1)Image captioning method based on semantic attention.The fluency of image descriptions depends on the semantic relationship between contexts.To extract more semantic features to guide the generation of descriptive sentences,we propose a semantic attentionbased image captioning model.We encode text features into word vectors using the word embedding module and transform semantic relationships between context into relationships between features.The semantic attention module can extract visual and semantic features and use the sentinel mechanism to weight the two types of features.Experiments on the MS COCO and Flickr30 k datasets show that the proposed method is feasible and achieves improved evaluation metrics.(2)Image captioning method based on spatial correlation attention.The correlation between objects in an image can effectively improve the quality of image descriptions,but representing the relationship between objects using global features cannot accurately capture the correlation between objects.To accurately capture the correlation between objects and improve the accuracy of image descriptions,we propose an image caption generation model based on spatial correlation attention.This method uses object detection algorithms to extract visual features and object spatial position information from the image.After fusing visual features with spatial position information in a high-dimensional space,the spatial correlation attention feature of the image is captured.Finally,visual features and spatial correlation attention features are used as inputs to the visual attention and spatial correlation attention modules,respectively,to guide the generation of word sequences.Experiments on the MS COCO dataset validate the effectiveness of this method.
Keywords/Search Tags:Image captioning, attention mechanisms, Semantic features, Spatial correlation features
PDF Full Text Request
Related items