Research Of Image Caption Based On Encoder-Decoder

Posted on:2023-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:P F Shi

Full Text:PDF

GTID:2568306914456464

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

Image caption is a process in which machines automatically generate corresponding text caption according to the semantic information of objects contained in images.With the rapid development of deep learning technology,deep learning models represented by encoder-decoder have become the main generation framework for image caption.At present,the framework faces the following two problems:the caption information generated for complex scenes has low-level cognitive errors;image caption is a cross-modal task,but the current mature attention mechanism cannot complete the feature information of visual modalities and fully applied to the generation process of text summaries.Based on the different scenes of the image corresponding to the image,this article introduces the methods of visual knowledge and high-level interaction characteristics:(Ⅰ)Based on the visual common sense of causal intervention,the VC R-CNN image caption algorithm model is improved,and the performance represented by BLEU,ROUGE-L,METEOR,CIDEr and other parameters is significantly improved.Compared with the traditional algorithm,this paper introduces the visual common sense of second-order causal intervention and integrates it into the VC R-CNN algorithm model.The constructed image caption algorithm can better handle complex image scenes and reduce the calculation amount of the training process by 5%.(Ⅱ)An image caption algorithm based on fusion of high-order interaction features is proposed,which extracts higher-order interaction features by linear fusion of attention mechanisms,so as to solve the problem that the modal conversion process cannot make full use of the feature information of the original modalities,which eventually leads to the generation of the summary statement contains less complete and less specific issues.At the same time,this paper optimizes the performance of the attention mechanism to solve the problem that linear fusion leads to the computational complexity of the algorithm being higher than that of the original model.The experimental results show that the summary sentences generated by the algorithm in this paper can express the semantic information contained in the images more specifically and completely when the overall time complexity(including the time of the training process and the prediction process)does not change much.(Ⅲ)A set of image caption generation system based on encoderdecoder framework is designed and implemented.The system includes user management,algorithm implementation,result output and other modules.This paper uses real data to test and verify its performance.The main indicators include the rate of caption generation and BLEU,ROUGEL,METEOR,CIDEr,etc.The test results show that the system has certain practical value.

Keywords/Search Tags:

Image Caption, Encoder-Decoder, Attention, Visual commonsense, High-level interaction characteristics

PDF Full Text Request

Related items

1	Research On Image Caption Method Based On High Level Semantic Extraction And Attention Mechanism
2	Image Caption Model Based On Feature Extraction Via Dense Convolutional Neural Network
3	Research On Image Caption Based On Attention Mechanism
4	Research On Image Caption Algorithm Based On Attention Mechanism
5	Image Caption Technology Based On Deep Semantic Information
6	Research On Key Technologies Of Image Caption Based On Multimodal Feature Understanding
7	Visual And Text Feature Alignment Algorithm For Video Caption
8	Research On Video Caption Based On Deep Learning Sequence Model
9	Image Chinese Caption Generation Method Based On Attention Mechanism
10	Research On Image Caption Method Based On High-level Image Semantic And Attention