| Human action recognition has a wide range of application scenarios in social life,such as intelligent monitoring,medical health and athlete assisted training,etc.,and has gradually become one of the hot research directions in recent years.Compared with RGB image and depth image,3D skeleton sequence has gradually become the mainstream research direction due to its advantages of simple feature representation and not easily affected by the external environment.The existing human action recognition technology based on skeleton is not perfect,there are some problems such as poor recognition performance.In this thesis,based on graph convolutional network,the spatial dependence of 3D skeleton extraction,the expression of temporal features and the complex co-occurrence of spatial domain during construction are studied in depth,and experiments are carried out on NTU-RGBD60 and NTU-RGBD120 datasets.Firstly,a graph convolutional network model based on skeleton subgraph was proposed to solve the problem of motion difference between different body parts.In this model,the skeleton graph is divided into four parts according to the physical structure of human body,and the spatial graph convolution operation is performed on each part to capture the motion differences between different parts.Then,the dependency of each part is aggregated by feature fusion function to complete feature extraction in spatial dimension.Secondly,considering the uneven distribution of features between joints,a dual attention mechanism is proposed.This mechanism enables the model to focus on the key regions that are effective for human action recognition by assigning different attention weights to the features of the channel domain and the spatial domain.In this mechanism,the channel attention submodule and spatial attention submodule are connected in series to achieve the task of re-calibrating the original features.At the same time,residual connection is added to prevent the network performance from being reduced when the channel attention weight parameter is zero.Finally,on the basis of the models in the first two chapters,an adaptive graph convolutional network model is proposed considering that the original graph structure in graph convolutional network is a fixed structure based on the physical connection of human body,ignoring the correlation between unconnected nodes.By adding learnable parameters into graph convolution,the graph structure can be adjusted adaptively according to different samples.On this basis,according to most of the method using only the skeleton joints information and lack of high-level expression of skeleton features,we propose a multiple branching input model,which enable the joint information,bone and joint movement information semantic feature to be modeled respectively for training,and finally to fuse multiple branch results,to further strengthen the expression of bone movement information ability. |