Font Size: a A A

Monocular 3D Human Reconstruction

Posted on:2023-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:T S ZhangFull Text:PDF
GTID:2558307061464164Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Obtaining 3D geometry information of the human body from a monocular camera is a key problem in the field of digital virtual humans.Compared with multi-view camera systems,monocular cameras have the advantages of low cost and wide application range.However,due to the lack of depth information,how to reconstruct accurate 3D human meshes from single images is a challenging and underdetermined question.Further,when there is interference information such as occlusions in the input image,it is difficult for the existing algorithms to robustly reconstruct a reasonable 3D human,so the robust human mesh recovery problem under occlusions needs to be explored.The 3D human in the real world is dynamic,and there are problems such as jittering and unrealistic problem when combine the results of single frame.How to recover the realistic and dynamic 3D human is a very challenging research topic.This paper is devoted to reconstructing accurate,robust,and dynamic 3D human bodies from monocular input information,and constructs a real-time monocular motion capture system.The main innovations of this paper are as follows:(1)We construct a 3D human representation model based on the UV position map and propose a 3D human reconstruction framework guided by the 2D pose and mask,which improves the accuracy of 3D human reconstruction.We propose to use the UV position map to represent the 3D mesh of the human body and store the corresponding 3D vertex positions on the coordinates of the UV map.By converting the 3D mesh into a 2D image,the efficient computing performance of the neural network can be fully utilized and highly nonlinear mapping could be avoided.When reconstructing a human body based on this representation,we combine 2D pose and silhouette information to the image features to guide the reconstruction.(2)A dual-branch neural network of image inpainting-human reconstruction is propose,and the visible UV position map is utilized to represent the occluded 3D human.The occluded human reconstruction problem is transformed into the image inpainting problem through latent space supervision,which can robustly reconstruct 3D human from occlusion scenes.Based on the dense correspondence between occluded images and UV position maps,we can represent occluded 3D human bodies by preserving the UV position maps corresponding to the visible regions in the image.During reconstruction,we first train an image inpainting network from the visible UV position map to the full UV position map,and then supervise the image features in the latent space to be consistent with the features of the inpainting network,so that the trained inpainting network decoder can be used directly reconstructing the human body.(3)We propose a 3D human motion prior based on conditional variational autoencoder(c VAE),and Transformer is used to construct the encoder and decoder of c VAE.The dynamic3 D human can be reconstructed by sampling the real human motion latent space with 2D pose constraints.To make the reconstructed human motion more realistic and reasonable,we introduce a 3D human motion prior,and use a variational autoencoder to learn the distribution of real human motion.The training data can use large-scale motion capture data.In the reconstruction phase,we use the 2D human pose sequence as a conditional constraint to make the reconstruction result consistent with the 2D pose,while supporting the generation and editing of 3D human motion.(4)A real-time monocular human motion capture system is constructed.It only requires a single camera to get the input image,which is suitable for any background and light conditions.We use the C ++ language and Tensorrt framework to deploy the existing monocular human reconstruction model and optimize the pre-processing operation time,and the operating speed reaches 60 FPS,which meets the real-time requirements.
Keywords/Search Tags:3D Human Reconsruction, Motion Capture, Deep Learning, Pose and Shape Estimation, 3D representation
PDF Full Text Request
Related items