Research On Hand Pose Estimation And Recognition Algorithm In Human-computer Interaction

Posted on:2024-06-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M Y Zhang

Full Text:PDF

GTID:1528307184465754

Subject:Electronics and information

Abstract/Summary:

PDF Full Text Request

Human-computer interaction refers to the process of information exchange between humans and computers using some kind of conversational language and in a certain interactive way to accomplish a defined task.Hand pose estimation and gesture recognition technology have broad application prospects in the field of human-computer interaction.Traditional input devices can no longer meet people’s needs for natural and intuitive interaction.As a nimble and effective executor,the hand plays an important role in daily life.Gesture estimation and recognition technology can recognize and understand human gesture movements and convert them into computer instructions,thereby achieving natural interaction with the computer.Gesture estimation and recognition technology has been widely applied and developed in fields such as entertainment,consumer goods,smart homes,medical care,industrial design,intelligent driving,and space applications and has had a profound impact.In recent years,gesture estimation and recognition technology has been widely applied and developed in fields such as entertainment,consumer goods,smart homes,medical care,industrial design,intelligent driving,and space applications and has had a profound impact.based on deep learning has been able to achieve gesture recognition through ordinary RGB cameras,greatly reducing costs and providing more free and natural interaction methods.With the continuous development of deep learning and neural network technology,the accuracy and real-time performance of gesture estimation and recognition technology are also constantly improving,bringing more possibilities and convenience to human-computer interaction.To improve the accuracy of hand pose estimation and solve the problems of complex hand segmentation masks and difficulty in recognizing different scale gestures,we propose a novel neural network model called CH-HandNet.CH-HandNet consists of three modules:hand segmentation mask,preliminary 2D hand pose estimation,and hierarchical estimation.The hand segmentation mask module consists of upper and lower branches and uses a label of the hand mask to guide the learning of hand segmentation.The hierarchical estimation module estimates the posture of different fingers and palms to optimize the estimation of different scale hand poses.Hierarchical estimation is the main optimization strategy based on gesture topology.First,we merge the palm and thumb,followed by merging the other fingers,and finally merging these two branches together.This step-by-step hierarchical approach further improves the performance of the model.Experimental results show that our proposed method has significant advantages in hand pose estimation and prediction accuracy.At the same time,our method can effectively solve the problems of complex hand segmentation masks and the difficulty in recognizing different scale gestures.To overcome the complexity and poor adaptability of the CH-HandNet model,as well as the challenges of losing too much feature information due to downsampling in gesture pose estimation,low usage of gesture pose information for gesture recognition applications,and low accuracy of key point localization,we propose a simple framework convolutional neural network model named Fishbone Skeleton Convolutional Neural Network(FS-HandNet).The model mainly consists of three parts: the fish head adopts an efficient bidirectional pyramid structure(Bi PS)to effectively alleviate the information loss caused by feature downsampling and small target feature extraction;the fish body utilizes a high-resolution preservation structure with asymmetric convolution(HRACS)to maintain high resolution and enhance its feature extraction ability and network robustness to image flipping;and the fish tail adopts a simple deconvolution head structure(Dc HS).To implement an application based on hand pose information for gesture recognition,we use a fish skeleton network structure to predict hand pose information and recognize multiple gestures based on a convex hull algorithm and hand pose information.The experimental results show that our method achieves the best performance.By using the efficient Bi PS and asymmetric convolution HRACS structure,we have successfully solved the problem of information loss caused by downsampling and small target feature extraction,thereby improving the model’s adaptability and performance.In addition,our model can also be applied for the recognition of multiple hand gestures.To address the challenges posed by the FS-HandNet model in terms of parameter volume,computational complexity,complex network structure,speed-accuracy trade-offs,and application issues,we propose a novel method called MSIPA-HandNet.This method utilizes the Multi-Scale Information Perception structure based on Attention mechanism(MSIPA)structure to extract multiscale information during the downsampling process while limiting the growth of model parameters.Next,we use upsampling to restore resolution and locate keypoints.We then adjust the position information of keypoints based on the Distribution-Aware coordinate Representation of Keypoints(DARK)algorithm to improve the model accuracy,and the speed-accuracy trade-off(SAT)metrics are proposed to evaluate the model performance based on the constructed model and the DARK algorithm used.Finally,we use the keypoint position information obtained from hand pose estimation for real-world applications.We conducted experiments on a public hand pose dataset,and the results show that our proposed method outperforms state-of-the-art methods in several aspects.This approach not only reduces the complexity of the model but also improves estimation accuracy,enabling various applications on the computer side.To address the low accuracy of the FS-HandNet model in gesture recognition tasks,as well as to resolve issues related to excessive model parameters and difficulties in deployment and application,we propose a lightweight gesture recognition network(LHGR-Net).LHGR-Net mainly consists of three parts: a basic network structure,a multi-scale structure(MSS),and a lightweight attention structure(LAS).The motivation for this design is that MSS and LAS can enhance the network’s representation power,with MSS considering both global information and local details,while LAS can handle long-range dependencies and make the network more attentive to useful information in context.We combine these structures to fully utilize their strengths and compensate for their weaknesses.A complete process of gesture recognition algorithm,model deployment and application is realized.Experimental results show that compared to state-of-the-art methods,the LHGR-Net model has higher accuracy and faster inference speed,and can be successfully deployed on a Raspberry Pi.

Keywords/Search Tags:

Human-computer interaction, Hand pose estimation, Gesture recognition, Speed-accuracy trade-offs, Deployment and Applications

PDF Full Text Request

Related items

1	Research On Key Technologies Of Vision-based Hand Gesture Interaction
2	Research On Gesture Recognition And Hand Pose Estimation In Augmented Reality
3	Research And Implementation Of Key Technologies For Dynamic Gesture Interaction In Virtual Simulation Experiment
4	Computer Vision-based Gesture Recognition Technology Human-Computer Interaction
5	Studies On Tracking And Recognition Of Real Time Hand Gesture Based On Vision
6	Research On Unmarked Hand Gesture Recognition Based On Computer Vision
7	Gesture Recognition In Augmented Reality Based On Deep Learning
8	The Application Of CNN-based Hand Pose Estimation In Hand Posture Recognition
9	Complete Pure Hand Gesture Vision Based Human-Computer Interaction Technology
10	Research On Hand Gesture Recognition Methods For Natural Interaction