| The highly informative society has made images and videos the main carriers of information,and how to efficiently and intelligently analyze the human body information has become a current research hotspot.3D human pose can represent human space and structure information,and it is the basis for tasks such as human tracking and behavior recognition.Different from the 2D human pose that locates the plane coordinates in the single-view image,the 3D human pose needs to further detect the depth information of the human joint points,but factors such as illumination,occlusion,and shooting angle make it difficult for 3D pose estimation to achieve high-precision requirements.The key to solving this problem lies in how to introduce more spatial information to enhance the feature representation and spatial positioning of the human body,and this is the advantage of multi-view data.Multi-view data describes objects from different spatial perspectives,and the views include both the implicit consistency that characterizes the same target feature and the difference complementarity that focuses on different perspectives.The topic of this paper comes from the National Natural Science Foundation of China project "Analysis of Human Behavior Based on Deep Learning"(No.61976022),which deeply captures spatial details and high-level semantics through the fusion of multi-view and different scale features,and analyzes it from both single and multi-person perspectives.3D pose estimation research,the main research contents are as follows:In terms of multi-view single-person 3D pose estimation,in order to solve the problem of insufficient utilization of low-scale features at present,and to deeply capture human features and constraints in the information fusion stage,this paper proposes a single-person 3D human pose estimation based on multi-view feature fusion.method,in the feature extraction stage,the low-scale feature encoding of adjacent views supplements the missing spatial details of the current view and enhances the information extraction of the current view features.In the human pose estimation stage,the multi-view epipolar relationship and human body structure constraints are used to aggregate cross-view long-distance semantic information of joint points,and geometric dependencies are used to enhance the fusion of spatial features.Compared with the existing mainstream methods,the method proposed in this paper on the Human3.6M public data set improves the indicators by an average of 4.1%.In the aspect of multi-view multi-person 3D pose estimation,to deeply mine the multi-view physical mapping relationship to simplify the multi-person matching process,and incorporate projected geometric information in the pose regression stage to constrain the prediction results,this paper proposes a multi-view feature fusion-based method.A multiperson 3D human pose estimation method.In the human body positioning stage,this paper encodes the human body and views according to the positioning mapping relationship between multi-people and multi-view and combines the serialized multi-scale features to predict the positioning of multiple people in three-dimensional space.In the human pose estimation stage,this paper uses the projected geometric relationship to aggregate the global spatial information of the multi-view heatmap,refines the joint point feature response and discards the useless noise through the parallel and residual 3D convolution structure,and designs the projection angle constraint to limit the joint point regression range of space.The accuracy index is improved by 4%compared with the existing mainstream methods on the CMU Panoptic Studio dataset. |