| 3D human pose estimation identifies and reconstructs the pose of a human body from images,videos and other objects that contain information about the person’s movements,and obtains the coordinates of the human joints in the view in 3D space.The task of 3D human pose estimation based on monocular images or videos still presents many pressing problems due to the problems of depth blurring and occlusion of human joint points in realistic scenes.Although previous work has made great progress through deep learning methods,recovering 3D human pose from a monocular view is essentially an ill-posed problem,i.e.there are multiple 3D joints with different depths that can be projected to the same 2D joint position along the light,and many previous works have mostly ignored this problem.Many studies in recent years have proposed multi-hypothesis methods for the one-to-many nature of the ill-posed problem,and although they are a further improvement over traditional deep learning methods,most of these methods rely on parameter sharing or feature extractors of the same structure to capture 3D human pose information to generate multiple pose hypotheses,with limited valid pose information available and strong information similarity of the generated multiple pose hypothesis features,which cannot effectively mitigate the impact of the ill-posed problem.To address the above problems,this study proposes a 3D human pose estimation method based on multiple hybrid extractors and multiple hypothesis feature sharing fusion,which obtains different pose hypothesis feature representations by designing multiple structured hybrid extractors for different pose information,and enhances the 3D human pose estimation capability of the model by using multiple hypothesis interaction sharing fusion to interact with multiple pose hypotheses and feature fusion.Reconstruction of more accurate 3D human pose.The main work of this paper is as follows:1.Current multi-hypothesis methods in video-based 3D human pose estimation mainly encode a single 2D human pose to generate multiple 3D human pose hypothesis features through one-to-many mapping,but the multiple pose hypothesis features generated by using feature sharing or feature extractors with the same structure are not information specific,the acquired 3D pose information is limited,and the reconstructed 3D human pose is not accurate enough.To solve this problem,this study proposes a multi-hybrid extractor network,which combines multi-headed self-attentive mechanism and convolution module in different forms to build multiple feature extractors with different structures,to obtain structural information,detail information and action information of human pose from multiple perspectives of basic features,diverse features and condensed features,and uses multi-hypothesis attention module to enhance each hypothetical feature to generate more accurate 3D pose.2.In order to establish information associations between multiple postural assumed features,and generate better 3D postures,this study constructed a shared fusion model of multiple assumed features to establish relationships between cross assumed features.It mainly includes hypothesis feature interaction and hypothesis feature sharing fusion.The hypothesis feature interaction module utilizes cross attention for information communication and feature complementarity between pose assumptions.The hypothesis feature sharing fusion module uses a hypothesis fusion device shared by multiple parameters to fuse multiple hypothesis features,and ultimately maps the fused features into a 3D human pose sequence through a regression head.The multi hypothesis feature sharing fusion model enhances the ability of multi to one mapping of overall network pose information,improving the quality of reconstructing 3D human pose.3.In order to demonstrate the application of the algorithm proposed in this paper,this study designs and implements a 3D human pose estimation visualisation system,into which users can log in to use the online 3D human pose estimation function. |