| The purpose of the 3D human pose estimation and reconstruction task is to detect the position of each part of the human body in 3D space from pictures or videos and calculate its direction and scale information to reconstruct the 3D human body.It has been widely used in human-computer interaction,video surveillance,and medical treatment health and other directions,has important research value and development prospects.The existing research on 3D human pose estimation and reconstruction is often difficult to achieve satisfactory results when reconstructing in natural scenes under occlusion.In addition,the scale inconsistency caused by the distance between humans and cameras also affects the estimation and reconstruction of human bodies in natural scenes.Therefore,how to estimate and reconstruct a 3D human body more accurately from a single image has been a research direction that people have been working on.Aiming at the above problems,this paper studies the single-view 3D human pose estimation and reconstruction algorithm,and proposes a 3D human pose estimation and reconstruction algorithm based on scale-adaptive feature fusion and human orientation constraints,and a multi-representation 3D human pose estimation and reconstruction algorithm.The method in this paper can better fuse global information and multi-scale features,learn the structural relationship of the human body,and better constrain the orientation of the human body,so as to achieve a more accurate 3D human pose estimation and reconstruction.The main work of this paper is as follows:1.In order to effectively fuse the feature information extracted by the network at different scales to enhance the network’s reconstruction ability,this paper proposes a scale adaptive feature fusion network.The entire network is divided into two parts:a global feature extraction module and a scale adaptive feature fusion module.The global feature extraction module retains the learning of global information by the backbone network,and the scale adaptive feature fusion module adaptively learns the scale information and complements the global features with the learned scale features to improve the network’s perception ability of scale features.2.In view of the inconsistency between the predicted human body orientation and the actual human body orientation in natural scenes,this paper proposes a loss function related to human body orientation.By calculating the cosine similarity of the plane normal vector formed by three joints of the human body,it assists in constraining the three-dimensional pose estimation and reconstruction of multiple people,and shows good results in the test data set.3.A multi representation network structure is proposed to address the occlusion problem in natural scenes.The network models the human body serialization to learn the serialization relationships of the human body;Modeling the human body in a graphical structure to learn and aggregate the feature information of neighboring nodes helps the network to infer and learn occluded parts. |