Font Size: a A A

Research Of Image Caption Based On Encoder-Decoder

Posted on:2023-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:P F ShiFull Text:PDF
GTID:2568306914456464Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Image caption is a process in which machines automatically generate corresponding text caption according to the semantic information of objects contained in images.With the rapid development of deep learning technology,deep learning models represented by encoder-decoder have become the main generation framework for image caption.At present,the framework faces the following two problems:the caption information generated for complex scenes has low-level cognitive errors;image caption is a cross-modal task,but the current mature attention mechanism cannot complete the feature information of visual modalities and fully applied to the generation process of text summaries.Based on the different scenes of the image corresponding to the image,this article introduces the methods of visual knowledge and high-level interaction characteristics:(Ⅰ)Based on the visual common sense of causal intervention,the VC R-CNN image caption algorithm model is improved,and the performance represented by BLEU,ROUGE-L,METEOR,CIDEr and other parameters is significantly improved.Compared with the traditional algorithm,this paper introduces the visual common sense of second-order causal intervention and integrates it into the VC R-CNN algorithm model.The constructed image caption algorithm can better handle complex image scenes and reduce the calculation amount of the training process by 5%.(Ⅱ)An image caption algorithm based on fusion of high-order interaction features is proposed,which extracts higher-order interaction features by linear fusion of attention mechanisms,so as to solve the problem that the modal conversion process cannot make full use of the feature information of the original modalities,which eventually leads to the generation of the summary statement contains less complete and less specific issues.At the same time,this paper optimizes the performance of the attention mechanism to solve the problem that linear fusion leads to the computational complexity of the algorithm being higher than that of the original model.The experimental results show that the summary sentences generated by the algorithm in this paper can express the semantic information contained in the images more specifically and completely when the overall time complexity(including the time of the training process and the prediction process)does not change much.(Ⅲ)A set of image caption generation system based on encoderdecoder framework is designed and implemented.The system includes user management,algorithm implementation,result output and other modules.This paper uses real data to test and verify its performance.The main indicators include the rate of caption generation and BLEU,ROUGEL,METEOR,CIDEr,etc.The test results show that the system has certain practical value.
Keywords/Search Tags:Image Caption, Encoder-Decoder, Attention, Visual commonsense, High-level interaction characteristics
PDF Full Text Request
Related items