Font Size: a A A

Research And Application Of Video Classification Based On Multimodal Feature Fusion

Posted on:2023-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y R FengFull Text:PDF
GTID:2568306830481384Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid increase of video production,the demand for video understanding technology and its application is becoming more and more urgent.Video classification needs to identify the objects,actions or events involved in video based on its available multimodal data,which is one of the basic tasks of video understanding.At present,the main research method of video classification is to build a network based on multi-mode feature fusion.Therefore,in order to capture the interaction between heterogeneous data in multi-mode fusion,this paper takes the video classification method of audio-video mode fusion as the main research content,and applies the algorithm to the actual system,combining theory with practice.The main research contributions are as follows:Firstly,a low-rank multimodal fusion method based on LMF was improved to calculate the attention weights of video features and audio features in the fusion process,and then they were weighted to the multimodal features to obtain the frame-level fusion features with modal attention.At the same time,the frame-level features are aggregated into video-level features by Next VLAD,an attention-based feature clustering algorithm,and the compressive excitation context gating unit is used to suppress useless information and amplify valuable features.Experiments show that the proposed method can capture the interaction between different modes compared with the simple fusion method,and the accuracy of the proposed method is87.8% on the Kinetics400 validation set of large-scale video data set.Secondly,the multimodal decomposition bilinear pooling method MFB is applied to the multi-mode feature fusion at the frame level.On this basis,the self-attention mechanism of parallel computing is introduced and applied to the input modes to improve the interaction between different modes.Compared with other methods,the proposed method is superior to other video classification models based on convolutional neural networks.Finally,a short video classification platform based on Flask and Bootstrap framework is designed using the model trained by the algorithm on the data set,and the automatic process of short video type label generation is realized.
Keywords/Search Tags:audio and video feature fusion, video classification, the features of clustering
PDF Full Text Request
Related items