Font Size: a A A

Research On 3D Human Pose Estimation Method Based On Transforme

Posted on:2024-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:B W ZhangFull Text:PDF
GTID:2568306935965709Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
3D human pose estimation is a research hotspot in the field of computer vision,and it is also one of the most basic tasks in the field of computer vision and image processing.The output of the 3D human pose estimation model is also a prerequisite for subsequent completion of subsequent higher-level computer vision characters.However,the current 3D human pose estimation mainly has the following two problems:(1)The 3D human pose model mainly adopts a two-stage method for human pose estimation,which causes the model in the second stage to only use spatial information A limited amount of two-dimensional human body pose information is used,and a two-stage model is used to predict the three-dimensional human body pose,which increases the difficulty of model deployment.(2)The acquisition of 3D human pose data sets usually requires professional equipment and technology,such as depth cameras or RGB cameras.These more professional devices limit the number of 3D human pose data sets that can be obtained currently.Based on this,for the first question mentioned above,this paper proposes an end-to-end human pose estimation network based on Transformer model;for the second question mentioned above,this paper proposes a loop training network,in the model At the same time as the training,new human body posture data is generated,and the outdoor three-dimensional human body posture training data is generated by means of external rendering。In view of the insufficient ability of the two-stage model to extract 3D information of joint points and the difficulty of deploying the two-stage model,inspired by the detection of the Transformer model,a human body pose recognition architecture based on the Transformer model is designed.First,the model uses a CNN backbone network to extract image features,and then put the features into a cascaded Transformer decoder,and reduce the difficulty of the model predicting the position of the joint points through the intermediate supervision from coarse to fine;in the decoder stage,the learnable joint point query and the encoder processing results are input to In the cascaded Transformer decoder,the decoding result of the decoder directly uses multiple prediction heads to regress to obtain the joint point position;finally,a joint point completion network is used to complete the lack of joint points caused by severe occlusion.During the running of the model,no greedy search algorithm is used to postprocess the predicted joint points.Experimental results show that the proposed model achieves the best performance in the single-stage 3D human pose estimation task.Aiming at the problem of difficult collection of 3D human pose estimation data sets.This paper proposes the Trans Cycle 3D human pose training network.The proposed Trans Cycle uses Trans Pose GAN to generate 3D action sequences.Trans Pose GAN uses a cascaded pose generation network to complete the generation from low-resolution actions to high-resolution actions.On this basis,Trans Pose GAN designs a Patch-based Transformer discriminator network,which divides the input action into multiple patches according to the time dimension.Each patch is evaluated individually to better learn the relationship between generated action frames and frames.For the action sequence generated by Trans Pose GAN,Trans Cycle inputs it into the renderer,uses multiple camera perspectives to generate training data mappings from pictures to 3D human poses,and adds these data maps to the training process of the 3D human pose estimation network.Experimental data shows that the 3D human pose estimation model trained by Trans Cycle has a significant improvement compared with the model trained using traditional human pose datasets.
Keywords/Search Tags:human pose estimation, Transformer network, generative confrontation network, data enhancement
PDF Full Text Request
Related items