Font Size: a A A

Research On Computer Vision Based Detection And Recognition Of Dynamic Human Gestures

Posted on:2024-05-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:1528307316480174Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Gesture recognition is an important research direction in the field of humancomputer interaction.Its purpose is to enable computers to understand the meaning expressed by human body gestures in specific scenarios,and provide individuals with accurate and efficient services.The application scope of gesture recognition technology has covered people’s life,entertainment,industry,medical and other aspects.Visionbased gesture recognition has the advantages of being natural and low cost.In recent years,it has received extensive attention.However,visual information is complex and diverse.How to detect gestures from visual sequences and accurately recognize gestures has always been a difficult problem faced by gesture recognition methods.This dissertation starts from the spatio-temporal correlation between body parts in gesture actions,studies the description,extraction and fusion methods of visual spatio-temporal context features of gestures,improves the performance of gesture detection and recognition methods.The dissertation conducts research in three aspects: skeletonbased gesture recognition,image-based gesture recognition and gesture detection,and has achieved the following innovative contributions:(1)Aiming at the insufficient ability of existing methods to fuse multi-modal skeleton data,a multi-modal skeleton graph based graph convolutional gesture recognition method is proposed.Firstly,gestures are described by the joint positions,rotation angles,skeleton vectors and root rotation angles of the human body and hands.An innovative graph structure with interconnected multi-modal skeletons is proposed to describe spatio-temporal features and correlations of gestures.Secondly,aiming at the degradation problem of traditional spatial configuration partitioning strategy,a height layering partitioning strategy is proposed to enable well-distributed partitions.Finally,a graph convolutional network is designed to learn the strength of three types of connection relations,including fixed,data-driven and layer data-driven relations.The method solves the problem of extracting and fusing the correlated inter-modality features while preserving topological structure,enabling accurate skeletal gesture recognition.(2)Aiming at the recognition of gestures performed with limbs and fine-grained hand motions,an image focus fusion based gesture recognition method is proposed.Firstly,according to the natural and logic connection relations between body parts,they are combined in multiple levels,a hierarchical part combination method is proposed to achieve multi-focus coverage of different gesture actions.Secondly,the surface modality of the body is extracted and fused with color and optical flow modalities at the feature and decision level to improve the representation ability of the focuses.Finally,the classification contributions of focuses are learned respectively for each gesture class,a focus fusion scheme that can eliminate interference from lowly correlated focus is proposed.The method solves the problem of multi-level focus selection and complementary fusion,achieving accurate recognition of gestures performed with limbs and fine-grained hand motions.(3)Aiming at the detection problem in dynamic gesture sequences,a part affinity field based detection method is proposed.Firstly,a 3D heatmap volume based method is proposed to represent the spatial temporal correlations of the skeleton,depicting joint positions and confidences,spatial temporal skeleton articulation relationships,and part motion trajectories.Secondly,a Gaussian based progressive boundary probability construction method is proposed to reduce the difficulty of boundary fitting.Finally,a temporal field non-degradation network architecture is proposed to enhance the accuracy of the boundary prediction.The method solves the problem of spatial temporal context description of volume features,enabling accurate detection of continuous dynamic gestures.(4)To address traffic commanding gesture recognition problem in autonomous driving scenarios,skeleton-based and image-based detection and recognition methods are integrated,a multimodal ensembled traffic gesture detection and recognition experiment is conducted.Firstly,limb features are represented with multimodal skeleton graph,the global and fine-grained body part features are represented with image based focuses,the gestures are detected based on the part affinity field representations,the recognition is performed with skeleton and image fused features.Then,a sliding window scheme for handling stream data is designed to improve bandwidth efficiency through early detection and batch inference.Finally,a traffic commanding gesture dataset with four commanding directions is collected to validate the method,and an online experiment is performed in real outdoor environments.In this dissertation,a multi-modal skeleton graph based graph convolutional gesture recognition method,an image focus fusion based gesture recognition method and a part affinity field based gesture detection method are proposed.A multimodal gesture detection and recognition ensemble architecture is integrated to solve continuous dynamic human gesture detection and recognition problems involving limb and hand motion collaboration,and achieves good detection and recognition results.The method can be applied to natural human-computer interaction,autonomous driving and many other scenarios,which has significant theoretical research meaning and application value.
Keywords/Search Tags:Computer vision, Deep learning, Human-computer interaction, Gesture recognition, Gesture detection, Multi-modal fusion
PDF Full Text Request
Related items