| Video-based human detection,tracking and three-dimensional reconstruction refers to detecting the human body from the video with human motion,then extracting the key points of the human skeleton,and using this information to construct a three-dimensional human model with a similar posture and shape to the human body in the video.This technology has a wide range of applications in game production,film and television advertising,virtual reality and so on.At present,the most widely used methods to obtain 3D human body model based large-scale three-dimensional scanning instruments,or wearable device capture and others.Although these methods have high modeling accuracy,they requires the human body to stand in front of the device for a long time without arbitrarily movements or wear heavy equipment thus leads to unnatural body postures,and the cost is high.In order to overcome the shortcomings of traditional human body modeling methods,this paper focus on human body detection based on convolutional neural network to extract key information of the skeleton,then apply the information to the three-dimensional human body reconstruction.First,this paper examines the MobileNet network,which turns a traditional convolutional layer into a depthwise separable convolution.The MobileNet greatly improves the forward computing speed and reduces the parameters with only a very small sacrifice of prediction accuracy.Based on the MobileNet,this paper expands the receptive field by applying Dilated Convolution,and improves the network accuracy under the premise of ensuring the same parameter quantity.Secondly,this paper studies the OpenPose algorithm,which is an algorithm based on real-time detection and tracking of multi-human bone joint points.This paper reconstructs the OpenPose network architecture based on lightweight convolutional neural network,and replaces the original VGG network with the depthwise separable convolution of MobileNet.The original parallel network layer of OpenPose is shared,and the single 7×7 convolution kernel is replaced by a continuous convolution block having 1×1 and two 3×3 convolution kernels,thereby reducing the amount of calculation and improving the precision.Finally,this paper studies the SMPL model,which is a data-driven parametricmodel based on skinned vertexes.In this paper,the Kalman Filter is used to correct the coordinates of the key points of the 2D bone obtained by OpenPose.The corrected bone keypoints coordinates are in one-to-one correspondence with the three-dimensional keypoints of the SMPL model.Determine if the body rotates,calculate the direction of the human body from the side angle of view.Then calculate the projection of the coordinates of the three-dimensional keypoints in the original direction and the lateral direction to the coordinates of the two-dimensional keypoints.Calculate the distance between the coordinates of the keypoints in the two directions and the coordinates of the acquired bone keypoints,and select the direction with a small relative distance as the three-dimensional direction of the target in the image.Compared with the acquisition of 3D models by other devices,the proposed method can be applied to common video.In addition,this paper compared the method of optimizing MobileNet network with the standard MobileNet network,the method of optimizing OpenPose network and the original OpenPose,and the SMPL model of changing body direction adjustments and the original SMPL model.The experimental results show that the method proposed in this paper has improved in human body detection,tracking and 3D reconstruction. |