| With the rapid development of computer and deep learning technology,human action recognition,as an important research topic in the field of deep learning,has broad application prospects in many fields.Compared to using RGB video,human action recognition based on skeletal data has the advantage of being unaffected by factors such as light intensity and video background noise.Graph Convolutional Neural Networks can effectively model the human skeleton using the natural topology of the skeleton.However the traditional spatio-temporal graph convolutional network human action recognition algorithm still has certain problems:Firstly,the fixed topology of the predefined adjacency matrix has limitations for different action tasks and multi-layer graph convolutional neural networks.Secondly,most of the action types only involve the spatio-temporal motion of part of the joints,and the spatio-temporal features of the other unrelated nodes are redundant,which will affect the discrimination ability of the network for similar actions.In addition,most of the existing improved algorithms tend to model spatial features,which has the problems of unbalanced spatio-temporal feature modeling and insufficient time series feature extraction ability.in order to improve the spatio-temporal graph convolutional neural network of skeletal data spatio-temporal feature extraction ability.Based on the spatial temporal graph convolutional networks model,this paper focuses on the shortcomings of existing spatio-temporal graph convolutional neural networks,and the following elements of research work have been carried out:1.A traditional spatio-temporal graph convolutional network human action recognition method based on skeletal data is firstly constructed,and an adaptive data-driven graph convolution method is proposed to address the limitations of the predefined adjacency matrix of traditional graph convolution.Adaptive data-driven graph convolution adaptively changes the adjacency relationships between nodes during training according to the learned action task or data samples,creating virtual connection relationships for physically non-adjacent interaction nodes and assigning them different connection weights,thus enhancing the flexibility of the model for feature extraction of different action types.To address the problem of redundant features in traditional spatio-temporal graph convolutional networks,a multidimensional attention mechanism is designed to guide the model to reasonably allocate weight resources in three dimensions: space,time and channel,to focus on the main moving nodes and to reduce the impact of spatio-temporal features of redundant nodes.Training and testing on the datasets NTU RGB+D,NTU RGB+D 120,and Kinetics-Skeleton verified the effectiveness and stability of the algorithms.Through experimental analysis,it is proved that adaptive data-driven graph convolution has a high recognition accuracy for action types interacting with nonadjacent nodes,and the attention mechanism can guide the model to pay more attention to the moving nodes themselves,so as to further improve the recognition accuracy of human motion.2.In order to solve the problem of imbalance in spatio-temporal feature modeling of the existing spatio-temporal graph convolutional network,a temporal channel aggregation graph convolutional network is proposed by using adaptive temporal convolution and channel topology modeling to improve the modeling imbalance between the spatial dimension and the temporal dimension,and enhance the spatio-temporal feature extraction ability of the model.Aiming at the problem of insufficient temporal feature extraction ability of temporal convolutional network used in spatio-temporal graph convolutional network,a multi-scale temporal convolution module of skeleton sequence is formed by using dilated convolutions with different expansion rates to aggregate the temporal features of bones with different time steps and improve the temporal feature extraction ability of the network.The effectiveness of the algorithm is verified on the datasets NTU RGB+D,NTU RGB+D 120 and KineticsSkeleton.The verification results show that the time channel aggregation graph convolution and multi-scale time convolution modules can capture the fine features of the action and improve the resolution ability of the network to recognize difficult actions. |