Research On Action Recognition Method Based On Spatiotemporal Feature Fusion And Knowledge Distillation Technology

Posted on:2023-07-30

Degree:Master

Type:Thesis

Country:China

Candidate:W Liang

Full Text:PDF

GTID:2568306833989189

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Action recognition is an important research topic in computer vision,which has a wealth of application scenarios,including behavior analysis,video retrieval,human-computer interaction,game entertainment,etc.Existing solutions for video-based action recognition model algorithms usually have two issues: 1)Because the video is made up of sequence pictures,video’s temporal and spatial dimensions are not equal in importance.2)On the other hand,while many schemes for video temporal features extraction can reach a finer level,it still lacks the distinction of the visual rhythm of the action;if spatial and time series features are put into the classifier in equal proportions for classification,this will result in an imbalance of spatiotemporal features and affect the classification results;We present a spatiotemporal feature pyramid network based on 3D convolution based on the aforementioned two difficulties.The spatiotemporal feature pyramid widens the receptive area of the spatial and temporal dimensions,solving the problem that the model lacks the visual rhythm differentiation of action.The developed multilayer feature extraction module assures that the spatiotemporal characteristics input into the classifier are reasonably balanced,addressing the issue of video spatiotemporal feature imbalance.On the public dataset Kinetics-400,we constructed a 3D convolution-based spatiotemporal feature pyramid network,and its accuracy achieved the maximum 76.68% in top-1 and 93.18% in top-5,which is a substantial advantage over other techniques.To be used in real-world applications,most algorithmic models must be installed on resource-constrained devices.However,The model’s vast size makes deployment problematic,which is a prevalent issue in the field of deep learning.As a result,model compression is extremely significant for model optimization.We use the model compression approach of knowledge distillation to the action recognition algorithm based on video for model optimization.We design a layered feature distillation module to address the uniqueness of the action recognition challenge.This module mainly divides the features in the time dimension and the spatial dimension,and compares them respectively to ensure that the output features of each layer of the student model are as close as possible to the output features of the corresponding teacher model.Its core is spatiotemporal feature transfer loss function,which fully considers the transfer of video temporal information and spatial information in knowledge distillation.In the experiment,we use 3D Resnet with different layers as the feature extraction network in the public dataset UCF101.The results show that using multilayer feature distillation module for training can not only improve the training efficiency of the model,but also improve the recognition accuracy of the model,up to 4.4%.

Keywords/Search Tags:

action recognition, spatiotemporal features, knowledge distillation, spatiotemporal feature transfer loss

PDF Full Text Request

Related items

1	Research On Action Recognition Algorithm Based On Spatiotemporal Modeling And Its Application
2	Spatiotemporal Deep Neural Network For Video Salient Object Detection
3	Research On Video Action Recognition Model Based On Convolutional Neural Network With Attention Mechanism
4	Action Recognition Based On Spatiotemporal Features Of Human Skeleton
5	Video Action Recognition Based On Spatiotemporal Grouping And Cooperative Network
6	Research On Action Perception Method For Service Robot Based On Local Spatiotemporal Features
7	Research On Methods Of Spatiotemporal Feature Modeling Based Activity Recognition
8	Human Action Recognition Based On Spatiotemporal Two Stream Convolution Network
9	Action Recognition Based On Spatiotemporal Convolution Networks
10	Spatiotemporal Feature Learning For Video Action Recognition