The purpose of human pose estimation is to locate the spatial position of the key points of the human skeleton or body parts in the input image or video to obtain the pose of the whole human body.With the development of convolutional neural networks,the research of human pose estimation has achieved great success and has been applied to various degrees in many real-life fields,but there are still some practical problems that need to be solved.First,the bottom-up multi-person pose estimation method suffers from scale sensitivity and quantization error,which limits its further performance improvement.Second,human pose estimation models based on convolutional neural networks tend to focus on improving accuracy while ignoring the importance of efficiency,resulting in the inability to port these methods to resource-constrained lightweight devices.To address these two problems,this paper investigates two aspects of improving the accuracy of bottom-up multi-person pose estimation methods and balancing the accuracy and efficiency of human pose estimation.The main research work of this paper is as follows:(1)To address the scale sensitivity and quantization error problems of bottom-up multi-person pose estimation methods,this paper proposes a multi-person pose estimation method based on context feature and refined network.Firstly,a feature pyramid structure is used instead of the fast downsampling structure in the high resolution network(HRNet)to retain the information of low-level features.Then a multi-scale feature extraction method is proposed and an attention mechanism is used to fuse multi-scale features to obtain context feature containing multi-scale information and enhance the scale invariance of the network.Finally,an efficient refined network is proposed to solve the quantization error problem and use multi-resolution supervision to facilitate network learning.A series of experiments on the MS COCO and MPII datasets demonstrate that the context feature and refined network-based approach can effectively address the scale sensitivity and quantization error problems.(2)To address the problem of high complexity of existing human pose estimation models,this paper proposes an attention-based lightweight high resolution network approach to achieve a balance of accuracy and efficiency.Firstly,a lightweight network Small HRNet is designed by removing the redundant structure of HRNet,the number of parameters of Small HRNet is only 12% of that of HRNet but the performance is 88% of that of HRNet.In addition,the number of parameters and computational complexity of Small HRNet are further reduced by using the inverted residual module,and the accuracy of the model is effectively improved by using the attention mechanism.Finally,a training method based on improved knowledge distillation with KL divergence loss function is used to further improve the performance of the model.A series of experiments on MS COCO and MPII datasets verified that the approach based on attention mechanism and knowledge distillation can effectively achieve the balance of accuracy and efficiency. |