| With the development of virtual reality and augmented reality technologies,3D hand pose estimation becomes an important way for human-computer interaction in virtual scenes,and it can show the human-computer interaction process more naturally and intuitively,which can directly improve the user experience.Before the emergence of deep learning methods,researchers commonly used sensors or manual extraction to capture pose information,and later deep learning-based methods also mostly chose RGB-D cameras to capture depth images to assist in pose estimation.Although the above methods based on RGB-D or sensors have good accuracy,the difficulties of popularization and high equipment costs cannot be ignored.Considering the above issues,the 3D hand pose estimation methods based on monocular RGB images have attracted more and more researchers’ attention.3D hand pose estimation methods based on monocular RGB images are inexpensive to develop but face many challenges,self-obscuration,high flexibility and lack of depth information being particularly problematic,and the complexity of methods for predicting 3D gesture poses directly from the original images,and network training is difficult.Considering the influencing factors,this paper realizes a multistage hand pose estimation and three-dimensional model reconstruction methods.The method implements and optimizes 2D hand pose estimation and 3D hand pose estimation separately.In the 2D hand pose estimation stage,a high-resolution featureholding network is used as the skeleton,while a spatial attention mechanism and a channel attention mechanism are incorporated to assist the detection of key points by combining the correlation between finger key points,and the 2D coordinate prediction results are used as the precondition for 3D spatial estimation.In the 3D hand pose estimation stage,an hourglass network with dual decoder branches is used to construct a model that simultaneously predicts 3D heatmap and palm depth information,both of them are coordinated with the 3D hand pose estimation task to improve the accuracy.In order to verify the effectiveness of this paper,ablation experiments and performance comparison experiments were conducted on the COCO-Whole Body dataset for the 2D pose estimation task,and ablation experiments and performance comparison experiments were conducted on the RHD and STB datasets for the 3D pose estimation task. |