Research On Fusion Method Of Multi-View Description In We-Media Video

Posted on:2023-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:Q Li

Full Text:PDF

GTID:2568306914971649

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology and the improvement of public consumption level,short video platforms have been developed as never before,and a huge amount of videos emerge every day,and the video content is diverse and of varying quality,which makes the platform for short video content audit demand more and more vigorous,but manual completion is time-consuming and laborious,and needs to be combined with video description technology to automatically conduct intelligent analysis of video.Most of the existing short video description methods are based on the fusion of static and dynamic video features,and lack the mining of rich video information.In this paper,we propose a multi-view feature extraction method to address the above issues,interpreting the video from multiple perspectives and extracting key information that is effective for video description models.At the same time,a fusion method based on attribute semantic information is proposed to fuse the extracted multimodal information for characterisation,in order to reduce the interference between the modal information.Through the above methods,the video review efficiency of the short video platform can be improved and video content management can be facilitated.The specific work is as follows.(1)A video multi-view feature extraction method is proposed.Starting from a global-local perspective,the entity,action and connection logic are regarded as local perspective,short-term perspective and long-term perspective,and the scene features,target features,action features and key frame text semantic features of the video are extracted to fully consider the rich information of the video.The experimental results show that the baseline model improves the CIDEr evaluation index by 4.3%by comparing with other models in video information mining.(2)Based on the above multi-view features,a multimodal fusion method based on attribute semantic information is proposed.The method generates noun and verb attribute semantic information by applying an attention mechanism to the combination of different modal information,builds an attribute detector,converts the attribute semantic information into higher-order attribute semantics,and then embeds the higher-order attribute semantics into the LSTM weight matrix during decoding to guide the generation process of description statements and improve the accuracy of video description.The experimental results show that the model improves the CIDEr evaluation metrics by 8.8%and 3.6%compared with the conventional multimodal feature stitching method and the attention mechanism-based feature fusion method,respectively.

Keywords/Search Tags:

We-Media Videos, Multi-view feature, Feature fusion, attention mechanism

PDF Full Text Request

Related items

1	Classification And Retrieval Of 3F Models Based On Multi-Feature Fusion
2	Heterogeneous View Feature Attention For Multi-label Classification
3	Research On Multi-view Feature Fusion Methods
4	Sentiment Analysis Of Visual-Textual Comments Based On Multi-Level Attention
5	Image Semantic Segmentation Based On Multi-level Feature Fusion And Attention Mechanism
6	Research On Attention Mechanism And Multi-scale Feature Fusion Method For Object Detectio
7	Research On Object Detection Algorithm Based On Feature Fusion And Attention Mechanism
8	Video Action Recognition Based On Hybrid Attention Mechanism And Multi-scale Feature Fusion
9	Research On Concrete Pavement Crack Detection Algorithm With Optimized Side-Output Fusion And Attention Mechanism
10	Human Action Recognition Based On Attention Mechanism And Multi-Modality Feature Fusion