| Human action recognition is an important prerequisite and key research content for behavior analysis and understanding,and is widely used in short video,security detection,intelligent medical care,human-computer interaction and other fields.Compared with image and video data,human skeleton data is more robust to factors such as shooting environment,background,sunlight,and people.Therefore,it is necessary to conduct in-depth research on the action recognition methods of human skeleton data.This paper focuses on human action recognition based on human skeleton data.The specific work of this paper is as follows:Aiming at the problem of large intra-class differences in human skeleton data,a dual-attention graph convolutional neural network human action recognition method is proposed.First,the channel attention mechanism is used to extract the channel features of the corresponding layer,and the corresponding channel feature weights are calculated to establish the global channel correlation.Secondly,the spatiotemporal attention mechanism is used to extract the global spatiotemporal features of actions to suppress the interference of intra-class difference features.The experimental results show that,compared with the baseline algorithm two-stream adaptive graph convolutional network model(2s-AGCN),the improved model can recognize the two classification settings of X-Sub and X-View in the NTU-RGB+D60dataset.The accuracy rates are increased by 1.5% and 0.7%,respectively,and the recognition accuracy rates under the X-Sub and X-Set classification settings of the NTU-RGB+D120 dataset are increased by 0.2% and 0.7%,respectively.Aiming at the problem of small inter-class differences in action recognition from human skeleton data,the paper proposed a spatiotemporal excitation multi-scale graph convolutional network human action recognition.Firstly,the multi-scale time-domain convolutional network can increase the width of the time-domain network and the area of the time-domain receptive field,strengthen the extraction ability of time-domain difference features,improve the accuracy of action recognition and reduce the number of parameters of the model.Secondly,a spatio-temporal excitation network is introduced,which effectively stimulates the local spatiotemporal information of important nodes,prompts the network to learn the spatiotemporal feature information of local nodes,and obtains local difference features of similar actions.The experimental results show that,compared with the baseline model,the accuracy of this model under the two classification settings of X-Sub and X-View of the NTU-RGB+D60 dataset is improved by 1.7% and 0.4%.The accuracy rates of the X-Sub and X-Set classification settings of the dataset have increased by 1.0% and 1.2%,of which the accuracy of reading,writing,clapping,and rubbing two hands has increased by 12.0%,11.0%,6.0% and 4.0%.which can effectively improve the recognition accuracy of similar human actions. |