Research On Video Description Method Based On Feature Enhancement And Fusion Strategy

Posted on:2024-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Bai

Full Text:PDF

GTID:2568307097457504

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Video is one of the main carriers of information transmission in today’s human society,as a multimodal information data medium,the information contained is richer and more diverse than pictures and text.Video description aims to convert the video into text sentences to describe the content information related to the video,and this technology has a wide application prospect in human-computer interaction,assisting visually impaired people and video retrieval.The existing video description method has the problems of inaccurate localization and recognition of features in key areas of video,insufficient feature fusion and insufficient connection between words,resulting in the generated sentences that cannot correctly describe the video content.In view of the above problems,this paper proposes a video description method based on feature enhancement and fusion strategy,and the main research content is as follows:(1)In order to improve the model’s ability to locate the key areas of the video and extract the quality of static object features,this paper proposes an encoder VFE-4 based on feature enhancement.VFE-4 uses the dual attention module constructed by channel attention and spatial attention to construct the correlation between channels,which improves the ability of static feature extraction network to capture important regional features.At the same time,the feature enhancement module is integrated to provide correct detail guidance for the model by using local and global features,amplifying the feature differences of similar objects,and improving the accuracy of the coded features of the target subject.Experimental results show that the quality of the video static features extracted by VFE-4 proposed in this paper is significantly improved,which has a positive effect on the generation of more accurate statements by the decoding network.Compared to the benchmark model,VFE-4 improved by an average of 1.1% and 0.6%on MSVD and MSR-VTT datasets.(2)In order to improve the problems of insufficient feature fusion and insufficient connection between words,based on the encoder based on feature enhancement,this paper adopts three fusion strategies respectively,using the spatial module and the time series module to fully integrate different modal features,establish the correlation between the target subject and the behavioral action,and improve the fusion quality of the overall features.At the same time,in order to make the decoder not only improve the attention to important words,but also make full use of the feature information of previous words,this paper integrates the text attention mechanism TA into the decoder of the STC model,so that the model can predict important words that can better represent the video context information.Experimental results show that the STC1-TA model proposed in this paper has more full feature fusion,the relationship between the target subject and the behavioral pose is more clear,the predicted words can better represent the video context information,and the generated description statement is closer to the label statement.Compared with the optimization model of the basic model in recent years,STC1-TA improves the average of 1.2% and 1.5% on the MSVD and MSR-VTT datasets,and compared with other mainstream model algorithms in the same field,the STC1-TA proposed in this paper is better than the evaluation index of most model algorithms.

Keywords/Search Tags:

Video Caption, Encoder-Decoder, Feature Enhancement, Convergence Strategy, Text Attention Mechanism

PDF Full Text Request

Related items

1	Research On Video Caption Algorithm Based On Encoder-Decoder Model
2	Research On Video Caption Based On Deep Learning Sequence Model
3	Visual And Text Feature Alignment Algorithm For Video Caption
4	Research Of Image Caption Based On Encoder-Decoder
5	Research On Image Caption Based On Attention Mechanism
6	Research On Image Caption Algorithm Based On Attention Mechanism
7	Research Of Scene Text Recognition Based On Encoder-decoder Architecture
8	Image Chinese Caption Generation Method Based On Attention Mechanism
9	Image Caption Model Based On Feature Extraction Via Dense Convolutional Neural Network
10	Research On Encoder-Decoder Model For Complex Structure Text Recognition