Research On Human Action Recognition Algorithm Based On Spatio-temporal Graph Convolutional Network

Posted on:2024-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:J C Wei

Full Text:PDF

GTID:2568306944454934

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and deep learning technology,human action recognition,as an important research topic in the field of deep learning,has broad application prospects in many fields.Compared to using RGB video,human action recognition based on skeletal data has the advantage of being unaffected by factors such as light intensity and video background noise.Graph Convolutional Neural Networks can effectively model the human skeleton using the natural topology of the skeleton.However the traditional spatio-temporal graph convolutional network human action recognition algorithm still has certain problems:Firstly,the fixed topology of the predefined adjacency matrix has limitations for different action tasks and multi-layer graph convolutional neural networks.Secondly,most of the action types only involve the spatio-temporal motion of part of the joints,and the spatio-temporal features of the other unrelated nodes are redundant,which will affect the discrimination ability of the network for similar actions.In addition,most of the existing improved algorithms tend to model spatial features,which has the problems of unbalanced spatio-temporal feature modeling and insufficient time series feature extraction ability.in order to improve the spatio-temporal graph convolutional neural network of skeletal data spatio-temporal feature extraction ability.Based on the spatial temporal graph convolutional networks model,this paper focuses on the shortcomings of existing spatio-temporal graph convolutional neural networks,and the following elements of research work have been carried out:1.A traditional spatio-temporal graph convolutional network human action recognition method based on skeletal data is firstly constructed,and an adaptive data-driven graph convolution method is proposed to address the limitations of the predefined adjacency matrix of traditional graph convolution.Adaptive data-driven graph convolution adaptively changes the adjacency relationships between nodes during training according to the learned action task or data samples,creating virtual connection relationships for physically non-adjacent interaction nodes and assigning them different connection weights,thus enhancing the flexibility of the model for feature extraction of different action types.To address the problem of redundant features in traditional spatio-temporal graph convolutional networks,a multidimensional attention mechanism is designed to guide the model to reasonably allocate weight resources in three dimensions: space,time and channel,to focus on the main moving nodes and to reduce the impact of spatio-temporal features of redundant nodes.Training and testing on the datasets NTU RGB+D,NTU RGB+D 120,and Kinetics-Skeleton verified the effectiveness and stability of the algorithms.Through experimental analysis,it is proved that adaptive data-driven graph convolution has a high recognition accuracy for action types interacting with nonadjacent nodes,and the attention mechanism can guide the model to pay more attention to the moving nodes themselves,so as to further improve the recognition accuracy of human motion.2.In order to solve the problem of imbalance in spatio-temporal feature modeling of the existing spatio-temporal graph convolutional network,a temporal channel aggregation graph convolutional network is proposed by using adaptive temporal convolution and channel topology modeling to improve the modeling imbalance between the spatial dimension and the temporal dimension,and enhance the spatio-temporal feature extraction ability of the model.Aiming at the problem of insufficient temporal feature extraction ability of temporal convolutional network used in spatio-temporal graph convolutional network,a multi-scale temporal convolution module of skeleton sequence is formed by using dilated convolutions with different expansion rates to aggregate the temporal features of bones with different time steps and improve the temporal feature extraction ability of the network.The effectiveness of the algorithm is verified on the datasets NTU RGB+D,NTU RGB+D 120 and KineticsSkeleton.The verification results show that the time channel aggregation graph convolution and multi-scale time convolution modules can capture the fine features of the action and improve the resolution ability of the network to recognize difficult actions.

Keywords/Search Tags:

Action recognition, Spatio-temporal graph convolution, Adjacency matrix, Attention mechanism, Dilated convolution

PDF Full Text Request

Related items

1	Research On Action Recognition Algorithm Based On 3D Convolution
2	Research On Human Skeleton Action Recognition Method Based On Graph Convolutional Network
3	Research On Action Recognition Based On Deep Network Learning Of Spatio-temporal Features
4	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
5	Research On Human Skeleton Action Recognition Based On Graph Convolutional Network
6	Human Action Recognition Based On Spatio-temporal Graph Convolution Network
7	Research On Human Action Recognition Based On Spatio-temporal Graph Convolutional Neural Network
8	Research On Object Detection Method Based On Key Points And Graph Spatio-temporal Attention Mechanism
9	Action Recognition Based On Human Skeleton Graph Convolution And Image Convolution Fusion
10	Research On Deep Learning Algorithms For Human Action Recognition