Font Size: a A A

Graph Neural Network Based Method For Skeleton Based Human Action Recognition

Posted on:2023-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y H BianFull Text:PDF
GTID:2568307025962949Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Human action recognition is an important task in computer vision.The related technologies are becoming more and more mature,and have been widely used.According to the different input sources,human action recognition can be divided into RGB-based recognition,skeleton-based recognition,etc.This paper is based on skeleton data.Using skeleton data can alleviate problems such as target occlusion,different perspective,and background noise.Therefore,action recognition based on skeleton data has received more and more attention from researchers.However,the current research based on skeleton data still has bottlenecks.First,related topics focus on the extraction of space features,mostly ignoring global temporal dependencies.However,the lack of global temporal dependencies will make it difficult to avoid the weak semantic correlation of atomic action features.And due to the feature redundancy,the accuracy of the model tends to saturate rapidly due to the stacked modules.Secondly,there are few feature extraction methods for multi-person interaction information in existing models,and these methods may confuse single-player behavior categories with multi-player behavior categories.Finally,there is a lack of solutions for extraction multi-scale temporal features.The predefined small-size convolution kernels in many methods can only focus on the only fast dynamic mode of rate change.Focusing on the above three points,the main works of the paper are summarized as follows:(1)This paper proposes a skeleton-based action recognition method based on Transformer modeling temporal dependence.This method designs a feature extraction scheme that includes a self-attention feature encoding module(transformer encoder,TE).This module obtains longterm temporal dependencies to correlate global frame-level semantic information by mining the similarity of upstream temporal features.In addition,the method defines the feature redundancy in temporal dimension.By quantitatively analyzing the efficiency of different structures,the network is designed as a parallel structure including lateral connections(La C),and the amount of model parameters is reduced without reducing the accuracy.Moreover,the residual structure is also introduced in this method to reduce the risk of network degradation.(2)A method of character interaction action recognition is proposed.This method defines the M-layer normalization(M-LN)operation on the character dimension,avoids the noise introduced in the preprocessing phase,and enhances the distinction effect of the number of characters.An extended transformer encoder(E-TE)module is constructed to extract interactive features by modeling the behavior correlation between characters.A gradual multitask learning(GMTL)scheme is designed to enhance the ability of the network to identify the number of characters in the sample.(3)This paper proposes a skeleton action recognition method based on multi-scale temporal information.Multi-scale temporal embedding modules(MT-EMs)are designed as multi-branch structure.Each branch extracts information with different size convolution kernels and pays attention to different dynamic modes.A complete skeleton-based action recognition template scheme for extraction of temporal dependence in multi-scale is proposedIn this paper,ablation experiments are conducted to verify the effectiveness of the proposed methods,and the hyperparameters included in the network design were adjusted to select a reasonable network structure.Experimental results show that these proposed methods have well generality and effectiveness,and achieves the state-of-the-art on three large datasets,NTU-RGBD 60,NTU-RGBD 120,and Kinetics-Skeleton 400.
Keywords/Search Tags:skeleton-based action recognition, transformer, lateral connection, multi-task learning, multi-scale temporal embedding
PDF Full Text Request
Related items