| With the process of human society into the information age,the emergence of computers and their derivatives as the carrier of modern artificial intelligence has penetrated into people’s daily life from all aspects,and the status of human-computer interaction has become increasingly prominent.As the main human operation tool,the position and direction of the hand in space are essential for many potential applications.This also makes the importance of hand pose estimation increasingly significant and has become a hot research field in computer vision.However,there are still many difficulties in 3D hand pose estimation from a single RGB image.The rise of big data,the emergence of neural network model architecture,and the iteration of high computing power equipment have led to the emergence of deep learning in various fields,and its application in hand pose estimation has also made great breakthroughs.This article will briefly introduce the method of hand pose estimation from a single RGB image based on deep learning,and study it from the direction of network structure,which mainly includes two aspects:First of all,this article starts from the traditional convolutional neural network,and refers to the latest Mobile Net-v3 lightweight general-purpose skeleton neural network and its variant Mo GA network.For the input RGB image,this article designs a lightweight feature extraction network with high capabilities.In the design of the feature extraction network,this paper designs the inverse residual module based on the spatial attention mechanism to build the feature extraction network skeleton,and introduces a new activation function to improve the feature extraction effect of the network.Using this feature extraction network,the input single RGB image is converted into a multi-channels and small-sized feature map for the next step of processing.Secondly,for the feature map,this paper draws on the existing methods to locate the key-points of the hand pose and the three-dimensional restoration to complete the task of hand pose estimation.In this paper,a feature processing network incorporating the attention mechanism of multi-spectral frequency domain channels is designed,and the feature map is up-sampling and down-sampling as well as linear processing to estimate the distribution heatmap of the hand joint points.In order to ensure the full use of high-dimensional frequency domain information,a multi-spectral frequency domain attention mechanism is incorporated in the sampling process of feature processing.This process is carried out independently in two parts,so that it can be applied to the scene of multi-hand targets.In addition,a linear layer is used to estimate one-dimensional information such as the type of target hand and the relative depth distance of the root point.On the whole,the method is modular and the network structure is simple.Experiments show that,compared with the existing two-stage and three-stage methods as a reference,this method has higher hand pose estimation accuracy and smaller computational overhead.At the same time,this topic also briefly introduces the necessary neural network and background knowledge related to feature extraction,as well as the dataset used in the experiment.Based on the classic STB and RHD datasets experiments,this paper also tested the application effect of the latest Inter Hand2.6M multi-objective interactive hand pose estimation dataset. |