Font Size: a A A

Research On Fine Granular Rich Semantic Image Subtitle Generation Method Based On Deep Learning

Posted on:2024-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:C J ShiFull Text:PDF
GTID:2568307091965329Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image and text are the most common information carriers in daily life.Image caption technology can be applied in various fields such as guiding the blind and assisting the disabled,multimedia education,and assisted medical care,and has important research value and significance.Image caption is a cross modal generative task,which combines the key technologies of computer vision and natural language processing.This task aims to parse the image of the input model and generate a corresponding text description based on the image content.How to generate fine-grained and semantically rich text and improve the quality of image subtitle generation has become a focus and difficulty of research.This paper used deep learning methods to study how to utilize the entity detail information of images,how to fully explore potential association relationships in images,and how to generate diverse semantic rich texts in image subtitle generation tasks.The main research contents are as follows:1、To solve the problem of how to capture and utilize the visual entity details in the image in the image caption generative model,this paper proposed an image caption method based on image linear visual feature sequence.This method uses linear visual feature sequences to represent the global and local visual semantic information of an image,and uses depth semantic codecs to carry out depth semantic coding,and generates finegrained text containing detailed entity information.The experimental results show that the model can consider more visual target entities in the image when generating text,increase text details,and improve the performance of the model on public datasets.2 、 To solve the problem of how to mine and utilize the potential association information between entities in an image in the image caption model,this paper proposed an image caption method based on spatial scene graph analysis.This method abstracts the semantic information in the image into the result of the scene graph,and uses the codec based on graph convolution neural network to carry out semantic encoding and parsing,and finally generates a fine-grained text description.Experiments have shown that the model can generate more fine-grained image caption descriptions that include entity association relationships,and has improved some performance on public datasets.3、To solve the problem of how to enrich and improve the quality of the generated image caption content by the image caption generative model,this paper proposed an image caption method based on generative adversarial training.Based on the core idea of generation confrontation network,this method abstracts the training process of image caption generative model into a confrontation training process,strengthens the text generation ability of the generator,and generates a more realistic and vivid semantic image description.Experiments have shown that the adversarial trained model can generate more specific and vivid sentences,and generate more diverse and semantically rich image captions.
Keywords/Search Tags:image caption, deep learning, object detection, scene graph, generative adversarial networks
PDF Full Text Request
Related items