| With the advent of the era of the interconnection of all things,the research of human motion recognition based on computer vision has become an important research topic.Traditional motion recognition models mostly use image RGB data and depth data as data features to recognize actions,but these types of data usually contain a lot of external noise,which makes the robustness of the model poor.In recent years,the method of using the graph convolution network to analyze graph structure data provides a new research direction for the field of computer vision,and using the graph convolution network to model and analyze the typical graph structure data of human 3D bone points has also become one of the new hotspots of human motion recognition.In this thesis,the motion recognition method of human bone points based on the spatial temporal graph convolution network(ST-GCN)model is studied.The adjacency matrix in the graph convolution network is used to obtain the relationship between human adjacent bone points,and the attention mechanism is introduced to adaptively allocate the weight relationship between the spatiotemporal channels and the spatiotemporal frames.A series of improved methods for the ST-GCN model is proposed.The main research contents are as follows:Firstly,the spatial-temporal multi residual map convolution neural network model is proposed to make full use of the local and global information of the human bone spatiotemporal map.Aiming at the problem that the ST-GCN model can not distinguish similar actions accurately,based on the partition of graph subsets in a human bone spatial-temporal graph structure,this thesis designs a residual network model to learn the feature information between non-adjacent bone points of the human body,to make more comprehensive use of the feature information of bone points.In the spatial domain convolution of the human skeleton spatial-temporal map,the model learns two feature branches in parallel.One feature branch learns the adjacent bone points data(local features)divided by the graph subset,and the other feature branch uses the residual network to learn the non-adjacent bone points data(global features)not divided by the graph subset and uses the connection idea of short connection in the residual network to integrate the local features and global features in the two branches,Improving the expression ability of the model.The experimental results on NTURGB+D and Kinetics data sets show that the performance of the proposed spatialtemporal multi residual graph convolution neural network model is better than that of the traditional bone points motion recognition model.Secondly,a spatial-temporal multi residual graph convolution neural network model based on the attention mechanism is proposed to extract and utilize important features more effectively.Based on the method and idea of the channel attention mechanism model in the traditional convolutional neural network,the last three dimensions(C,V,T)of the spatial-temporal convolutional neural network input bone points feature map correspond to the three dimensions(C,W,H)of the image features.By introducing the channel attention module and frame attention module,the human bone feature information is compressed and gathered,and the weight relationship between channels and frames is adaptively allocated,enabling the model selectively emphasize useful features and suppress useless features.Finally,the test results on NTU-RGB+D and Kinetics data sets show the effectiveness of the model. |