| Human pose estimation technology is widely used in scenarios such as intelligent cloud monitoring,automatic assisted driving,and smart medical care.In order to efficiently extract the human behavior and action information in the massive pictures and videos collected in different scenarios,the use of deep learning for human pose estimation has become one of the current research hotspots.However,the structure design of the pose estimation network tends to be complex,resulting in the detection model being too large,which cannot achieve low-complexity large-scale deployment.The following research work is carried out to address this problem.Aiming at the problems of high parameter quantity and computational complexity of high-resolution detection network(HRNet)in the field of 2D pose estimation,a high-resolution fusion detection network based on Atrous Spatial Convolution Pooling Pyramid(ASPP)was proposed.While analyzing the overall process of HRNet network and the internal parallel multi-branch cross fusion structure,combined with the working principle of convolutional neural network and residual unit in the process of network feature extraction,it is concluded that the reason for the increase of HRNet parameters and complexity is that The cross-fusion process of feature map branches of different resolutions in the network needs to import a large number of residual modules and convolution operations to meet the fusion requirements of resolution transformation to the same size.On this basis,the proposed optimized network design completes the extraction of feature maps in the multi-branch subnetworks in the final stage of the HRNet network by means of convolution operations with different sampling intervals in each layer of the atrous convolution pyramid structure,at the same time,the network complexity is minimized,and the convolutional attention mechanism was constructed to improve the quality of the feature map output by the pyramid structure,so as to ensure that the network can efficiently realize the function of human pose key point detection.The detection performance of the optimized network model is verified on public datasets.Compared with the original high-resolution detection network,at the expense of sacrificing a little detection accuracy,the optimized model effectively reduces 38.6% of the parameters and 35.2% of the computational complexity in the COCO datasets training model;31.6% of the parameters and 40% of the computation in the MPII datasets training model,which facilitates the next deployment of the model.Aiming at the shortcomings of the human pose estimation network,which cannot effectively extract features due to the loss of key points in occluded or crowded environments,a high-resolution detection network based on a simple feature map generation module was designed from the perspective of optimizing the generation method of the network branch feature map.the HRNet network still uses the original extraction method for occlusion local input feature maps,which will result in invalid information extraction,resulting in unnecessary waste of resources.The feature map in the proposed optimization network was generated by the combination of local convolution and linear transformation,and the network unit was also combined with the attention feature fusion module to improve the quality of the network feature map.Through the detection and analysis of samples with different occlusion ratios in the 3DOH50 K dataset,we can see that the optimized model still maintains a high pose estimation performance;compared with the original HRNet model,the optimized model trained in the Crowd Pose dataset reduces the number of parameters by 18.9% and the computational complexity by 16.8%.,which is conducive to the large-scale application of the model. |