| 3D human pose estimation is a popular area of computer vision research,aiming to locate the position of human joints in 3D space from a given RGB image or video.It has a wide range of applications not only in real life,such as motion detection,human-computer interaction,video surveillance and augmented reality;but also in computer vision tasks,such as target detection and behavior recognition.Recent works have focused on the design of temporal feature extraction networks for human pose sequences,which only focus on how to accurately recover the position of each joint point,while ignoring the intrinsic relationship between local joints and the unique nature of limb structure in the human body.To recover accurate and reasonable 3D human pose from monocular video,it is not possible to focus only on the constraints of individual joint points.Therefore,this paper proposes joint angle loss function and limb length loss function to constrain the estimated 3D human pose.Firstly,a new a priori assumption is designed to treat the whole human body structure as consisting of many triangles.The main joint angles and limb lengths are constrained respectively,thus motivating the network to pay more attention to the geometric properties of the human body.In addition to this,each frame in the sequence is particularly closely related to the previous frame,especially when recovering the intermediate frames,and the model relies on modeling long sequences.Previous work has never focused on the motion amplitude of each joint when considering spatial properties,so in this paper we will model the motion amplitude of each joint over a time series.Previously used global feature encoders usually construct deep neural networks by stacking of neural networks to achieve feature extraction,which increases the computational effort and makes the model very complex,so this paper proposes a fully connected layer with residual connections as a global encoder.Extensive qualitative and quantitative experimental results on a widely evaluated public dataset show that the method in this paper achieves better performance compared to state-of-the-art methods.The main work of the paper is as follows:1.In this paper,we design a joint angle loss function and a limb length loss function,and draw on the triangle stability principle to constrain both a joint angle and two real limb lengths in a triangle formed by a joint point and two real limbs including that joint point,so as to guide the network to make reasonable inferences about human postures.The reasonableness of the human pose estimation is improved by constraining the output results.the weighted combination of MPJPE loss with angle loss and limb length loss improves the reasonableness of the human pose estimation,and thus the accuracy.2.In this paper,a method is proposed to model the motion amplitude.The single-frame per-joint position difference is obtained using the operation of position difference acquisition,and the two-dimensional joint coordinates are grouped into five local groups before entering the network,and the position difference is applied to the local group branches to optimize the original information before the local group branches go through the encoder to extract features.In addition,a fully connected layer incorporating residuals is proposed as a global feature encoder for encoding global features and decoding them together with local features.This method will make the features contain the motion trend of each joint in the sequence and deliver the optimized local features and global features to the decoder to recover a more accurate human pose.3.Build and implement a 3D human pose estimation system with a graphical user interface(GUI)using HTML and JavaScript to display and use the algorithm model of this paper in a web browser,and recover 3D human pose from any video. |