| Human action recognition is a challenging task in computer vision with broad application prospects and practical value in various fields.Skeletal-based human action recognition has received much attention due to the robustness and ease of access to skeletal feature information.Currently,there are several issues in deep learning-based human action recognition research.Convolutional neural networks are not suitable for handling time sequences in skeletal data,requiring additional processing to address temporal problems.Recurrent neural networks are prone to gradient vanishing or exploding when dealing with long-term skeletal motion sequences.The skeletal temporal information in the temporal convolutional network is affected by fixed convolutional kernels.To address these issues,this thesis improves the spatio-temporal graph convolutional network to enhance its performance in human action recognition tasks.The main work is as follows:(1)(1)A spatio-temporal graph Shift convolutional network with attention mechanism is proposed.The Shift convolution is used instead of the conventional convolution operation in the spatio-temporal graph network to capture the dependencies between distant human nodes,achieving a larger receptive field without changing the convolution kernel size.A mixed attention mechanism is embedded between the spatial graph convolutional layer and the temporal graph convolutional layer to enhance the network’s ability to focus on key feature information.Residual connections are introduced between the spatio-temporal graph modules to enable cross-channel information interaction and integration,improving the network’s feature representation.Experiments on the large-scale NTU RGB+D dataset show that the modified network model achieves good recognition performance.(2)An improved multi-path attention adaptive spatio-temporal graph convolutional network is proposed.To address the problem of inflexibility in manually setting convolution kernels,an adaptive network layer is added to the attention-based spatiotemporal graph Shift convolutional network,allowing the network to adaptively learn the optimal convolution kernels based on the different features of input data,thereby improving the model’s performance and generalization ability.Node and skeleton motion information is added to the skeletal input data of the spatio-temporal graph convolutional network to enrich the input features.Experimental results show that the improved network model achieves good accuracy in action recognition. |