Research On Image Caption Generation Based On Global And Multilevel Feature Extraction

Posted on:2023-09-01

Degree:Master

Type:Thesis

Country:China

Candidate:X D Han

Full Text:PDF

GTID:2558306845490844

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Image caption generation is the interdisciplinary research of computer vision and natural language processing,usually using encoder-decoder framework,the encoder extracts image features from the input image,the decoder uses image features to generate corresponding text descriptions,so the image feature extraction is the basis for generating text descriptions.Currently,image caption generation suffers from the problem of insufficient image feature extraction,which fails to expand the capability of reasoning in image caption.In this paper,we address the above problems by using Faster RCNN and Transformer models in image global and multi-level feature extraction,and the specific work is as follows.(1)GE-FTRAN model for image caption generation based on image global feature extraction is designed.The GE-FTRAN model is based on the Basic-FTRAN model.The encoder of the Basic-FTRAN model is composed of the Faster RCNN and the Transformer encoder for image feature extraction,and the decoder is the Transformer decoder for text description generation.The GE-FTRAN model encoder generates image global features from the average pooling of regiona features extracted by Faster RCNN,and then input each image region feature and global features to the Transformer encoder for learning,outputting image regional features as well as more comprehensive global features.An adaptive extraction module is designed to extract global features of each encoder layer and perform weighted fusion.The GE-FTRAN model decoder,designed a global feature adaptive guidance module based on the Transformer decoder to jointly guide the model to generate text descriptions using both global and region features.Experimental results using cross entropy loss and reinforcement learning in two stages of training showed that the GE-FTRAN model improved the BLEU-1,METEOR,ROUGE and SPICE evaluation index scores on the Microsoft COCO Caption dataset by 0.8%,1.1%,1.6%,and 1.3%.(2)MLE-FTRAN model for image description generation based on images multilevel feature extraction is designed.MLE-FTRAN model is based on Basic-FTRAN model.MLE-FTRAN model encoder inputs each image region features into Transformer encoder for learning,and outputs the image region features containing multi-level region relationship information using multiple Transformer encoder layers.MLE-FTRAN model decoder,designing a multi-level cross attention mechanism based on the multi-head cross attention mechanism at the Transformer decoder,using multilevel image region features to jointly guide the model to generate text descriptions.The experimental results of the two-stage training show that the MLE-FTRAN model improves the BLEU-1,METEOR,ROUGE and SPICE evaluation index scores by0.5%,0.8%,1.4% and 1.2% respectively over the Basic-FTRAN model for the above dataset.In this paper,the GE-FTRAN model was also linearly superimposed with each module of the encoder and decoder of the MLE-FTRAN model,which is noted as the GMLE-FTRAN model,the experimental results of the two-stage training showed that the GMLE-FTRAN model is better than the GE-FTRAN model in the above dataset,BLEU-1,METEOR,ROUGE and SPICE evaluation index scores improved by 0.1%,0.3%,0.1% and 0.2%,respectively.

Keywords/Search Tags:

Image Caption Generation, Transformer Model, Image Global Feature, Image Multilevel Feature

PDF Full Text Request

Related items

1	Non-local Image Caption Generation Based On Introspection Sequence Training
2	Research On Image Caption Generation Method Based On Deep Learning
3	Research On Image Caption Method Based On Multi-Feature Fusion And Visual Semantic Adaptation
4	Research On Image Caption Generation Model Based On Attention Mechanism
5	Research On Key Technologies Of Image Caption Based On Multimodal Feature Understanding
6	Research On Image Caption Based On Mixed Regional Feature
7	Research On Image Caption Task Based On Transformer
8	Research On Image Caption Based On Object-Attention Model
9	Research On Image Caption Generation Method Based On Deep Learning
10	Automatic Generation Of Content Based On Deep Learning