Research And Application Of Video Classification Based On Multimodal Feature Fusion

Posted on:2023-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y R Feng

Full Text:PDF

GTID:2568306830481384

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid increase of video production,the demand for video understanding technology and its application is becoming more and more urgent.Video classification needs to identify the objects,actions or events involved in video based on its available multimodal data,which is one of the basic tasks of video understanding.At present,the main research method of video classification is to build a network based on multi-mode feature fusion.Therefore,in order to capture the interaction between heterogeneous data in multi-mode fusion,this paper takes the video classification method of audio-video mode fusion as the main research content,and applies the algorithm to the actual system,combining theory with practice.The main research contributions are as follows:Firstly,a low-rank multimodal fusion method based on LMF was improved to calculate the attention weights of video features and audio features in the fusion process,and then they were weighted to the multimodal features to obtain the frame-level fusion features with modal attention.At the same time,the frame-level features are aggregated into video-level features by Next VLAD,an attention-based feature clustering algorithm,and the compressive excitation context gating unit is used to suppress useless information and amplify valuable features.Experiments show that the proposed method can capture the interaction between different modes compared with the simple fusion method,and the accuracy of the proposed method is87.8% on the Kinetics400 validation set of large-scale video data set.Secondly,the multimodal decomposition bilinear pooling method MFB is applied to the multi-mode feature fusion at the frame level.On this basis,the self-attention mechanism of parallel computing is introduced and applied to the input modes to improve the interaction between different modes.Compared with other methods,the proposed method is superior to other video classification models based on convolutional neural networks.Finally,a short video classification platform based on Flask and Bootstrap framework is designed using the model trained by the algorithm on the data set,and the automatic process of short video type label generation is realized.

Keywords/Search Tags:

audio and video feature fusion, video classification, the features of clustering

PDF Full Text Request

Related items

1	An Audio Classification Algorithm For News Video Retrieval
2	Research On Audio And Video Data Recognition Technology In High Speed Environment
3	Research On Automatic Video Classification Algorithm Based On Audio-visual Features And Svm
4	A 3D Convolutional Neural Network Video Classification Based On Image And Audio Fusion
5	Sports Video Analysis And Highlight Ranking Based On Audio/visual Fusion
6	Research On Violent Video Detection Algorithm Based On Bag Of Audio Words And Mpeg-7 Features
7	Research On Violent Video Detection Algorithm Based On Bag Of Audio Words And MPEG-7 Features
8	Multi-speaker Recognition Based On Audio-video Feature Fusion In Smart Environment
9	Video Classification Method Based On Clustering
10	Video Classification Technology Based On Deep Learning From An Audio Perspective