Font Size: a A A

Research On Human Pose Estimation Algorithm In Complex Scenes

Posted on:2024-01-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y DaiFull Text:PDF
GTID:1528307079452234Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-person pose estimation aims to detect anatomical keypoints of all human instances(e.g.,wrist and shoulder)from still images or video sequences,where keypoint locations describe the geometric/motion information of the human body concisely and vividly.It is a popular and fundamental research topic in computer vision,supporting many downstream applications such as human-computer interaction and security monitoring.With the development of deep learning and the release of large-scale datasets,recently proposed pose estimation methods have achieved impressive progress.Despite the encouraging success that has been achieved in simple scenes,the condition of “complex scenes” puts forward higher quality requirements for current methods,where “complex scenes” refers to RGB images that capture truncated persons,confused backgrounds,highly-overlapped people,severe occlusions and extreme poses,etc.The complexity and variability of real-world scenes may seriously affect the recognition accuracy of estimation models.However,pose estimation in complex scenes is the core challenge that must be addressed when moving it toward the actual deployment.As a result,this dissertation focuses on the research on pose estimation algorithm in complex scenes and conducts indepth research and discussion from three levels of model design,pose data,and learning strategy.In addition,this dissertation also studies the key scientific issues in the downstream skeleton-based action recognition to advance human-centered vision tasks toward practical applications.The main work includes:1.From the perspective of model design,this dissertation focuses on two main problems: 1)how to design an effective pipeline for crowded scene pose estimation; and 2)how to equip this pipeline with the ability of relation modeling for interference resolving.Firstly,this dissertation analyzes the model discrimination confusion caused by the problem of “multi-keypoints in one bounding box”,and proposes to encourage all keypoints belonging to the target or interference persons to be predicted.And then,a target-aware relation parser is designed to model the relation over all predicted keypoints,and largely relieves the confusion of models when seeing identical keypoints with totally distinct labels(e.g.,the identical knee exists in two bounding boxes).Furthermore,a skeleton graph machine is introduced to model the skeleton-based commonsense knowledge,aiming to estimate the occluded poses with the constraint of human body structure.Finally,experiments on mainstream benchmarks demonstrate that the proposed model has significantly improved the recognition accuracy and outperforms existing state-of-the-art models.2.From the perspective of pose data,this dissertation addresses two inherent data defects and accordingly studies: 1)how to adopt a data generation method to overcome inherent deficiencies brought by the existing pose datasets; and 2)how to optimize learning objectives to alleviate the pixel-level imbalance problem in training images.Firstly,this dissertation defines a new metric,i.e.,instance complexity,and reveals that the existing dataset deficiencies cover imbalanced instance complexity and insufficient realistic scenes.And then,this dissertation proposes a full-view data generation method to enrich the training data from the perspectives of both poses and scenes.By hallucinating images with more balanced pose complexity and richer real-world scenes,the proposed method can help improve pose estimators’ robustness and generalizability.In addition,to alleviate the severe pixel-level imbalance in the ground-truth of keypoint heatmaps,an adaptive category-aware loss is designed to gradually force the focus of the pose estimator to move on the foreground and hard pixels during training.Finally,extensive experiments on mainstream datasets show that the two proposed strategies can significantly improve various pose estimators’ accuracy.3.From the perspective of learning strategy,this dissertation aims to address the recognition degradation problem of estimation models in complex scenes caused by existing traditional learning strategies,and studies how to achieve efficient training by optimizing the learning strategy.Firstly,this dissertation finds that all examples are randomly organized and treated equally by the prior methods ignoring the the different example difficulty.Once trained,hard examples are underutilized as released datasets are most dominated by easy examples,resulting in poor robustness in difficult poses.Next,drawing on the human learning paradigm of organizing learning materials from easy to difficult,this dissertation designs a model-aware curriculum learning strategy,including difficultyaware course organization and model-aware course scheduling,and then fully taps the recognition potential of existing models.Finally,solid experiments on two challenging benchmarks demonstrate the effectiveness and generalization of the proposed learning strategy,especially for difficult pose recognition.4.From the perspective of downstream task,i.e.,skeleton-based action recognition,this dissertation studies how to construct complementary action representations from only partial forms to relieve the requirement of the co-existence of all skeleton forms in the inference stage.First,this dissertation finds that existing methods tend to improve recognition models by leveraging multi-form skeletons due to their complementary cues,while a typical situation is the existence of only partial forms for inference.In view of this,this dissertation proposes a novel adaptive cross-Form learning paradigm,which is used to force a model to adaptively mimic useful representations from various single-form models for smartly strengthening what has been learned,and thus exploiting the model potential and facilitating action recognition.Finally,extensive experiments on three challenging benchmarks show that the proposed learning strategy can greatly improve the accuracy of action recognition and create a new record for single-form action recognition.
Keywords/Search Tags:Multi-person pose estimation, Complex scenes, Relation modeling, Data generation, Curriculum learning
PDF Full Text Request
Related items