Font Size: a A A

3D Human Pose Estimation Based On Deep Learning

Posted on:2024-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:X R ShiFull Text:PDF
GTID:2568307067499914Subject:Computer vision
Abstract/Summary:PDF Full Text Request
3D human pose estimation is a technique used to predict human key points from images,which involves connecting adjacent key points to form a complete human pose skeleton by predicting the association between the positions of each key point.The accuracy of 3D human pose estimation has a wide range of applications,including behavior analysis,monitoring,entertainment,and rehabilitation treatment.In recent years,significant advancements in deep learning in the fields of object recognition,object detection,and video analysis have promoted the development of human pose estimation.Currently,the fusion of an attention mechanism module in the human pose estimation has become a hot research direction.The thesis combines existing 2D/3D human pose estimation network architecture with the attention mechanism to enhance the human pose estimation accuracy.Further details are provided below.·Temporal-spatial attention mechanism:We have completed the implementation of a 3D human pose estimation based on the temporal-spatial attention mechanism,and uses the standard CPN detection 2D results as input of our model;compared to VideoPose3D,which uses TCN operations for modelling temporal information,we add a temporal-spatial attention mechanism to model the temporal and spatial dimension information at the same time;compared to Poseformer,which uses patch embedding of the 17 keypoint positions of each 2D sequence,that lacks spatial information dimensional interaction between keypoints.We are directly linearly embedding the entire 2D sequence,then join the spatial position embedding,which helps the information interaction between keypoints.·Ablation experiments:Based on the CPN detection benchmark,we trained a more accurate HRNet to extract the 2D keypoints of the image,and thus designed experiments with different 2D inputs(CPN/HRNet/GT)to investigate the effect of 2D accuracy on the final result;and experiments with different sequence lengths(27,81,243,384)to investigate the effect of sequence length on the final result.·Results comparison:Through systematic experimental comparison,it is found that under the same operating environment and test conditions,the error measured by the spatial-temporal fusion attention network is lower than that measured by VideoPose3D and PoseFormer,and the MPJPE index obtained a relative error reduction of about 10.6%and 4.9%,P_MPJPE indicators obtained relative error reductions of about 9.2%and 3.4%,respectively,under the same operating environment and test conditions.
Keywords/Search Tags:3D Pose Estimation, Deep Learning, Attention Mechanism, Spatial Attention, Temporal Attention
PDF Full Text Request
Related items