Video Classification Technology Based On Deep Learning From An Audio Perspective

Posted on:2022-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Sun

Full Text:PDF

GTID:2518306323460274

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

For a long time,video classification is a popular research field that scholars in the field of computer vision pay more attention to,and it is also full of many challenges.People pay more and more attention to video classification methods based on deep learning.The main reason is that it can extract image feature information from video frames and use special network structures for parameter training and optimization,and extract image features for video classification..In recent years,deep learning technology has been favored by many researchers,and it has also played an active and effective role in the field of video classification.With the development of communication technology,the development of the network field is also improving.At the same time,the field of self-media applications is also developing rapidly.People are more willing to publish interesting things in daily life to the Internet.This phenomenon has appeared in the classification of video,video screening and The advancement of retrieval technology has proposed new research directions.In this realistic background,this article is mainly based on deep learning algorithms to study the short video classification problem.First introduce the types of data sets used and related theoretical technologies,mainly introduced the theoretical algorithms including audio feature processing technology and deep learning,followed by a detailed introduction of several commonly used video classification methods,mainly including:Convolutional Neural Network And recurrent neural networks.Then it elaborated on several well-known classification network model structures.The fourth chapter of this paper introduces the hybrid neural network model structure designed in this paper in detail.The designed network combines the C3 D network model and the LSTM model,and changes the traditional network structure.Use a multi-modal hybrid neural network to solve the video classification problem.This article focuses on how to use hybrid neural networks to improve the accuracy of short video classification.The focus of the network module is a hybrid neural network composed of a long and short-term memory model and a three-dimensional convolutional neural network.A new network model is designed to study the low-precision classification categories in the UCF101 data set.By integrating various useful clues,to capture the semantic features of the frame.The short-term memory network and the threedimensional network spatio-temporal features are used to extract motion features to identify the classification of short videos,and the training model is continuously improved.At the end of the paper,experiments were conducted based on the design ideas and network structure.The experimental results show that the hybrid neural network combination method we proposed is accurate and effective.

Keywords/Search Tags:

video classification, multi-modal features, convolutional neural network, LSTM

PDF Full Text Request

Related items

1	Multi-modal Video Scene Segmentation Optimization Algorithm Based On Convolutional Neural Network
2	Research On Key Technologies Of Human Action Recognition Based On 3D Skeleton
3	Research On Video Behavior Classification Technology Based On Spatio-Temporal Features
4	Research On Natural Language Description Generation For Short Video In Self Media
5	Research On Short Text Classification Algorithm Based On Graph Neural Network And Fusion Of External Features
6	Study On Event Image Classification By Fusing Multiple CNNs Based On LSTM
7	Research On Multi-modal Pornographic Video Detection Algorithm Based On Unsupervised Features And System Implementation
8	3D Model Classification Based On Shape Feature And CNN-LSTM Network
9	News Video Structure Analysis Based On Multi-modal Features
10	Research On Video Classification Based On Cross-Modal