Research On Fine Granular Rich Semantic Image Subtitle Generation Method Based On Deep Learning

Posted on:2024-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:C J Shi

Full Text:PDF

GTID:2568307091965329

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Image and text are the most common information carriers in daily life.Image caption technology can be applied in various fields such as guiding the blind and assisting the disabled,multimedia education,and assisted medical care,and has important research value and significance.Image caption is a cross modal generative task,which combines the key technologies of computer vision and natural language processing.This task aims to parse the image of the input model and generate a corresponding text description based on the image content.How to generate fine-grained and semantically rich text and improve the quality of image subtitle generation has become a focus and difficulty of research.This paper used deep learning methods to study how to utilize the entity detail information of images,how to fully explore potential association relationships in images,and how to generate diverse semantic rich texts in image subtitle generation tasks.The main research contents are as follows:1、To solve the problem of how to capture and utilize the visual entity details in the image in the image caption generative model,this paper proposed an image caption method based on image linear visual feature sequence.This method uses linear visual feature sequences to represent the global and local visual semantic information of an image,and uses depth semantic codecs to carry out depth semantic coding,and generates finegrained text containing detailed entity information.The experimental results show that the model can consider more visual target entities in the image when generating text,increase text details,and improve the performance of the model on public datasets.2 、 To solve the problem of how to mine and utilize the potential association information between entities in an image in the image caption model,this paper proposed an image caption method based on spatial scene graph analysis.This method abstracts the semantic information in the image into the result of the scene graph,and uses the codec based on graph convolution neural network to carry out semantic encoding and parsing,and finally generates a fine-grained text description.Experiments have shown that the model can generate more fine-grained image caption descriptions that include entity association relationships,and has improved some performance on public datasets.3、To solve the problem of how to enrich and improve the quality of the generated image caption content by the image caption generative model,this paper proposed an image caption method based on generative adversarial training.Based on the core idea of generation confrontation network,this method abstracts the training process of image caption generative model into a confrontation training process,strengthens the text generation ability of the generator,and generates a more realistic and vivid semantic image description.Experiments have shown that the adversarial trained model can generate more specific and vivid sentences,and generate more diverse and semantically rich image captions.

Keywords/Search Tags:

image caption, deep learning, object detection, scene graph, generative adversarial networks

PDF Full Text Request

Related items

1	Research And Application Of Image Caption Method Based On GAN And GRU
2	Object Detection Method In Complicated Environment Based_on Generative Adversarial Networks
3	Image Caption Method Based On Deep Learning
4	Research On Image Caption Algorithm Based On Graph Convolution Network
5	Research On Object Detection Method And Applications Of Ground Penetrating Radar Based On Generative Adversarial Networks
6	Research On Image Captioning Algorithms Based On Deep Learning
7	Research On Object Detection In The Wild Based On Deep Convolutional Neural Network
8	Study On Object Detection And Recognition Methods For Complex Scene Images
9	Scene Image Text Detection Research Based On Deep-Learning And Its Application
10	Research And Application Of Intelligent Image Caption