Research On Video Caption Algorithm Based On Encoder-Decoder Model

Posted on:2022-05-01

Degree:Master

Type:Thesis

Country:China

Candidate:G Xiong

Full Text:PDF

GTID:2518306512471924

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the development of deep learning,artificial intelligence has brought great convenience to human social life.As an important branch of video content analysis,video caption promotes the further development of video retrieval and video personalized recommendation.The video captioning is to use natural language to describe the visual content contained in the video,and the description sentence is required to have accuracy,readability and fluency.At present,in the research of video caption algorithms based on encoder-decoder model,the advanced semantic information of video is used as video semantic features,which can effectively assist the decoding model to more accurately convert video visual features into caption.Among them,the quality of video semantic features has an important impact on the accuracy of the caption generated by the decoding model.Therefore,in the encoding stage,in view of the low accuracy of the video semantic features extracted by the existing video semantic detector,this paper constructs a video semantic feature enhancement encoder model,and enhances the encoding feature through the highway layer structure,and adds the video semantic word difference amplification module,amplify the differences between semantic words in semantic features,and improve the accuracy of video semantic features.The experimental results show that the quality of the semantic features generated by the proposed algorithm is better,and it can more effectively assist the decoding model to improve the accuracy of the generated caption.In order to further improve the accuracy of the caption generated by the decoding model,in the decoding stage,the decoding model cannot give more attention to the important words of the video content during the learning process,and the difference between the word features is small In this paper,the word attention mechanism is combined with the word difference enhancement structure to build a word feature enhanced text decoder model,which makes the word features both important and different,and improves the performance of the decoding model.Through comparative experiments on standard datasets,experiments show that the caption generated by the algorithm in this paper is more appropriate to the video content.The generated caption is not only accurate,but also reflects the details in the video.At the same time,compared with other algorithms in the same field,the evaluation index of the caption generated by the algorithm in this paper is significantly better than other algorithms in similar research.

Keywords/Search Tags:

Video caption, Highway layer structure, Video semantic word difference amplification module, Word attention mechanism, Word feature enhancement

PDF Full Text Request

Related items

1	Image Semantic Understanding Introducing Word Embedding And Attention Augmentation Mechanisms
2	Research On Word Spotting Technology In Handwritten Historical Document Images
3	Web Content Extraction Research Based On Dom Structure Tree And Feature Word
4	Research On Image Caption Method Based On Attention Mechanism
5	Research On Cross-domain Chinese Word Segmentation Method Based On New Word Discovery
6	Research On Video Description Method Based On Feature Enhancement And Fusion Strategy
7	Research On Chinese Named Entity Recognition Based On Feature Enhancement
8	Semantic Similarity Measurement Of Short Text By Convolutional Neural Network Based On Multi-Dimensional Attention On Word Vector
9	Image Caption Generation Based On Attention Mechanism
10	Research On Multi-granularity Chinese Word Embedding Based On Glyph Structure