Human pose estimation(HPE)is an evergreen tree in the field of computer vision,defined as the recognition of key points of the human body,and has direct or indirect applications in popular fields such as human-computer interaction,healthcare and behaviour recognition.In recent years,the mainstream solution for HPE has been to use convolutional neural networks for image feature extraction,which are often implemented using deeper networks with deeper network hierarchies and larger number of parameters in order to achieve the desired accuracy of keypoint detection.In this case,the increase in model accuracy is often accompanied by a decrease in computational efficiency,so how to design a network model with a lower computational effort while keeping the accuracy of keypoints detection largely unchanged is the focus of this paper.This paper aims at lightweight network design,taking into account some shortcomings of existing networks,and constructs an efficient and convenient network structure by adopting a more computationally efficient approach to the conventional convolution in the model,and introducing a visual attention mechanism and a model compression method for knowledge distillation.The main research of this paper is as follows:(1)A lightweight high-resolution network incorporating attention mechanisms is proposed,which significantly reduces the number of parameters and floating-point operations required by the network while maintaining the characteristic that highresolution networks can capture image information at multiple scales.At the same time,channel attention and spatial attention are introduced in the visual attention mechanism,and the channel attention ECANet is added to assign weights to the channels in the input image and the spatial attention module focuses on regions of the image that are richer in local information.(2)Combining the model compression method of knowledge distillation,the teacher-student network structure in knowledge distillation is further optimized by introducing the implicit teacher network in the online distillation framework.Considering that the teacher network is obtained by compounding the student network,the original hourglass network in the student network is lightened by constructing a multi-hole residual bottleneck composed of deeply separable convolution and empty convolution,and additional jump connections are made between each hourglass unit to reduce the computation required for the teacherstudent structure while keeping the detection accuracy basically unchanged,and a lightweight online knowledge distillation is derived in combination with the online distillation framework network.(3)Ablation experiments and comparison experiments are conducted on the above two improved lightweight pose estimation network models to verify the effectiveness of each lightweight operational module.It is also compared with the classical model on MSCOCO and MPII datasets in terms of the number of parameters,floating point operations and accuracy of keypoints,etc.It is demonstrated that the lightweight network in this paper has improved in computational efficiency while keeping the overall network accuracy largely unchanged. |