Font Size: a A A

Study On Skeleton Hand Gesture Recognition Based On Spatial-Temporal Transformer

Posted on:2024-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:H L LiFull Text:PDF
GTID:2558307079493284Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Dynamic hand gesture recognition is one of the popular research topics in the field of computer vision,which is determined by its broad application prospect and interdisciplinary research characteristics.Due to the strong adaptability and robustness of skeleton data to dynamic environments and complex backgrounds,skeleton-based dynamic hand gesture recognition has gradually become the research priority in the current field.In recent years,more and more researchers have investigated skeleton-based hand gesture recognition using deep learning techniques and have obtained certain research achievements.However,effectively extracting and fully utilizing the spatialtemporal features and temporal dynamics of hand gesture skeleton sequences remains a difficult and challenging task for current skeleton-based hand gesture recognition.The flexibility and excellent performance of Transformer in modeling global correlation among elements of sequence make it a perfect solution for skeleton-based hand gesture recognition.Therefore,this paper combines visual Transformer to conduct the in-deep research and exploration of skeleton-based hand gesture recognition.Three different recognition algorithms are proposed,achieving excellent results on the current mainstream dynamic hand gesture datasets.They are summarized in the following three aspects:(1)To address the challenging problem of effectively extracting spatial-temporal features of skeleton data,a decoupled spatial-temporal Transformer model incorporating fingertip features is proposed to model the spatial-temporal correlation of hand gesture skeleton sequences simply and efficiently.In this paper,we decouple the spatial and temporal dimensions of the skeleton data at different levels of the Transformer and explore the effects of different spatial-temporal Transformer structures on recognition performance.In addition,we introduce fingertip information that can express subtle finger movements as compensation cues.Finally,using a two-stream framework,this model achieves superior recognition performance by appropriately fusing the spatialtemporal features from joint data and fingertip information,respectively.(2)Considering that the correlation among different joints in successive frames in complex gestures is more critical for accurate hand gesture recognition,a novel multistream spatial-temporal synchronous Transformer algorithm for skeleton-based hand gesture recognition is proposed to model the spatial-temporal features of skeleton data synchronously.It mainly consists of three modules: spatial-temporal chunks embedding module,spatial-temporal chunks transformer module,and inter-chunk transformer module.In addition,using joint data,skeletal data,and fingertip information as the input to this multi-stream framework,training alone,and eventual fusion demonstrated more significant performance improvements.(3)To address the negative impact of redundant joints in skeleton data on recognition performance and the simplicity of temporal motion features extracted by existing methods,a novel local and global spatial-temporal Transformer model is proposed for skeleton-based hand gesture recognition to learn the spatial-temporal correlation among key informative joints more effectively as well as to achieve a more comprehensive understanding of local and global temporal dynamics.Extensive ablation and comparison experiments fully validate the effectiveness of the individual components of the proposed model and the superior recognition performance of the overall model.In addition,generalization studies show that this skeleton-based hand gesture recognition algorithm has great generalization performance in human action recognition tasks.
Keywords/Search Tags:Dynamic hand gesture recognition, Transformer, Spatial-temporal joints, Multi-stream network
PDF Full Text Request
Related items