Font Size: a A A

Skeleton Action Recognition Study Basted On Collaborative Spatiotemporal Attention

Posted on:2023-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:S Q RenFull Text:PDF
GTID:2568306788955299Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Human motion recognition is a very popular research topic in the field of computer vision.Its main task is to judge the category of human actions,which is very helpful for the system to understand the information contained in the action and further processing.Human motion recognition has a wide range of applications,including intelligent monitoring system,unmanned security,patient monitoring and somatosensory games.According to different information carriers in the input model,human motion recognition can be divided into video motion recognition and skeleton motion recognition.The former takes the continuous image frame as the input of the model,while the latter takes the skeleton sequence obtained by motion capture equipment and three-dimensional human pose estimation algorithm as the input of the model.Different from the carrier of image and video,the skeleton data is more robust because it is not affected by illumination and background.This paper mainly studies human motion recognition based on skeleton.The following is the main work of this paper:(1)Firstly,this paper summarizes the skeleton action recognition methods based on deep learning technology in three ways: the method based on convolution neural network,the method based on cyclic neural network and the method based on graph convolution neural network;(2)The modeling of spatiotemporal features is the key to effectively solve the problem of skeleton motion recognition.Generally speaking,the two stream structure is used to model the temporal and spatial features respectively.Among them,the hierarchical co-occurrence network realizes the most advanced performance by learning the skeleton co-occurrence information.However,in the two stream structure of hierarchical co-occurrence network,spatial stream and temporal stream are independent of each other,resulting in poor performance.Inspired by compression and excitation mechanism,this paper proposes a spatiotemporal incentive module to improve the performance of hierarchical co-occurrence network.The spatiotemporal excitation module can not only recalibrate the spatial and temporal features by explicitly modeling the interdependence between channels,but also stimulate each other by explicitly modeling the interdependence between spatial and temporal features.Therefore,spatial and temporal flow can learn from each other effectively.In this paper,a large number of experiments are carried out on the skeleton based motion recognition benchmark NTU RGB + D data set.The results show that the module proposed in this paper can achieve good performance without introducing additional parameters;(3)The algorithm based on graph convolution neural network has become the mainstream method in skeleton motion recognition.However,in spatiotemporal graph convolution neural network,the structure of spatial graph convolution layer and the topology of temporal convolution layer are fixed,which greatly limits the spatiotemporal feature modeling ability of spatiotemporal graph convolution neural network.At the same time,attention mechanism is widely used in computer vision tasks because of its effectiveness and interpretability.Therefore,this paper proposes a collaborative spatiotemporal attention module to jointly learn the attention of spatial and temporal dimensions in the way of parameter sharing.The collaborative spatiotemporal attention module can be inserted between the spatial map convolution layer and the temporal map convolution layer.Firstly,the features learned by the spatial map convolution layer are calibrated in the two dimensions of space and time,and then the temporal features are introduced into the temporal map convolution layer to learn the temporal features,so as to help the model enhance the ability of spatiotemporal feature learning.The collaborative spatiotemporal attention module adopts the way of parameter sharing,so it is very lightweight and can be inserted into the mainstream method based on graph convolution neural network at a very low computational cost to obtain a relatively large performance improvement.In this paper,the spatiotemporal graph convolution neural network is selected as the backbone network,and a large number of experiments are done on two large-scale skeleton action recognition data sets.The results show that compared with other mainstream algorithms,the model in this paper can obtain quite or even better performance with fewer parameters.
Keywords/Search Tags:Skeleton action recognition, Convolutional neural network, Graph convolution neural network, Attention Mechanism
PDF Full Text Request
Related items