| Human behavior recognition,as an important research area in video understanding,has a wide range of applications in scenarios such as intelligent surveillance,intelligent caregiving,two-human interaction,and robot control.In the research of human behavior recognition,the use of RGB video data is easily affected by factors such as background diversity,lighting changes,and changes in the actor’s clothing,while human skeleton data itself is a high abstraction of the human body and has relatively good interference resistance,so the research of skeleton-based behavior recognition has become a hot spot.Among the current skeleton behavior recognition methods,the method based on graph convolutional neural network can accurately model the skeleton sequences in time and space,but there is less research on building network models for two-person interaction features,and the overall and local features of two-person interaction skeleton cannot be well complemented and the interaction actions cannot be recognized based on the minimum skeleton information,therefore,the accuracy of two-person interaction behavior recognition still needs to be improved.To address the above problems,this thesis proposes a two-player interaction behavior recognition method that incorporates 2s-AGCN and relational inference network model.The main work of this thesis is as follows:(1)The improvement of feature extraction for two-person interaction.In this thesis,according to the spatio-temporal dynamic relationship between joints,a different relationship graph is constructed for each frame,making it change from the original static graph to the dynamic relationship graph,so that the dynamic twoperson interaction feature is extracted through the Relational Adjacency Matrix(RAM)module.In order to further extract the features of two-person interaction,this thesis proposes an improved method for generating RAM module,which improves the structure of coder-decoder structure for extracting long-distance interaction features and geometric relationship features.The ablation experiment shows that the proposed method achieves 94.91% and 98.12% recognition rates on the CS and CV benchmarks of the NTU RGB+D 60 interactive dataset,respectively,which are0.26% and 0.93% higher than the benchmark network,and 90.66% and 92.30%recognition rates on the C-Sub and C-Set benchmarks of the NTU RGB+D 120 interactive dataset,respectively,which are 0.1% and 1.87% higher than the benchmark network,and achieves more advanced recognition effects on the CV and C-Set benchmarks.(2)Recognition of two-person interaction behavior based on attention mechanism.The graph convolution neural network based on non-Euclidean data mining can automatically capture the spatial structure and temporal dynamics nested in joints,but the graph convolution network only pays attention to local key information for feature extraction of two-person interaction,and other local information cannot be fully utilized.To solve this problem,this thesis introduces the attention mechanism module into the convolution block of binary relationship graph,and proposes the AGC-2SE module and DRAGCN-SGE network model.Among them,the AGC-2SE module can extract the features of joint attachment adaptively for joint points,and the complementary information of more pairs of interactive features can be extracted for single person skeleton features;DRAGCN-SGE network model can use the similarity between local features and global features to generate corresponding importance coefficients,which can guide and enhance the spatial distribution of semantic features.The CS and CV benchmarks of NTU RGB+D 60 interactive dataset obtained 96.10% and 98.90% recognition rates respectively,1.42%and 1.71% higher than the benchmark network,and the C-Sub and C-Set benchmarks of NTU RGB+D 120 interactive dataset obtained 91.02% and 92.73% recognition rates respectively,0.46% and 2.3% higher than the benchmark network classification,and the most advanced recognition effects were obtained on the CV and C-Sub benchmarks.(3)Fusion of relational reasoning network model.In order to solve the problem that interactive actions cannot be better recognized based on the minimum skeleton information,this thesis proposes a fusion pairwise relational inference network model,which combines the classification results of the graph convolution network based on the attention mechanism and the pairwise relational inference network.The pairwise relationship inference network can use the least skeleton information to infer the interactive relationship of joint points,input the coordinates of the pairwise joint points through the IRNh-TCN network,use different parts of the joint point information as independent objects,and model the relationship between them.The CS and CV benchmarks in the NTU RGB+D 60 interactive data set achieved 96.72%and 97.69% recognition rates respectively,which were 6.22% and 4.19% higher than the benchmark network,and achieved the most advanced recognition effect on the CS benchmark. |