Research On Deep Learning-Based Facial Landmark Localization And Pose Estimation Model Design

Posted on:2023-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:S P Jin

Full Text:PDF

GTID:2558307061953389

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Facial landmark localization and head pose estimation are important aspects of face analysis and processing,and have always been a research hotspot.Facial landmark localization and head pose estimation algorithms have a wide range of application scenarios in many fields,from behavior analysis,action recognition,to human-computer interaction,social interaction analysis,or virtual/augmented reality,gaze perception,etc.Traditional algorithms in this field are difficult to achieve satisfactory performance in natural scenes such as complex backgrounds,large poses,exaggerated expressions,heavy occlusion,and extreme lighting.In comparison,deep learning algorithms based on convolutional neural networks usually performs very well in complex environments.However,there is still a lot of room for improvement in existing algorithms in terms of the form of supervision,dataset utilization,and the overall efficiency of the algorithm pipeline.This paper conducts in-depth research on facial landmark detection and head pose estimation algorithms,and solves the problems of existing methods through a series of model design improvements.Specifically:1.The existing two types of facial landmark detection algorithms based on heatmap regression or coordinate vector regression are ineffective in constructing face priors and regression accuracy.Few algorithms use heatmap supervision and coordinate vector supervision in network training at the same time,and it is difficult to see breakthroughs and innovations in the form of supervision constraints.In response to this problem,this paper designs an explicit attention mechanism,and uses this mechanism to construct a facial landmark detection network based on the explicit attention mechanism,thereby realizing the joint use of heatmap representation and coordinate vector representation,by applying heatmap supervision to the explicit attention map and effectively merging the attention map with the shallow features of the backbone regression network.The network can effectively suppress the background response and texture response in the input image that are not related to the face structure,focusing on image features strongly related to facial landmarks.This paper also proposes a dynamic loss balancing strategy to further improve the performance of the model.2.The volume of the existing facial landmark dataset is nearly an order of magnitude smaller than that of other basic vision task datasets,which further magnifies the impact of the uneven distribution of the dataset on model training.At the same time,the labeling protocols of many existing data sets are inconsistent,making it difficult for researchers to use multiple datasets for model training at the same time,and the utilization of datasets is very low.In response to this problem,this paper discusses and proposes a new batch normalization module,called the Separabel Batch Normalization layer,which can dynamically generate adaptive mapping parameters according to the input features.The good embedding of this module with the existing excellent network architecture can improve the performance of neural networks,especially lightweight neural networks,on the premise of adding a very small amount of extra computing costs.This paper also proposes a cross-protocol training strategy,using different datassets to construct a mixed dataset for network training,which further verifies the applicability of the proposed module and improves the utilization of the existing dataset.3.Existing head pose estimation algorithms often rely on pre-face detection steps,and the step-by-step pipeline algorithm process design not only increases the inference process and time,but also implicitly constitutes the pose estimation algorithm for face detection.The prior bias of the frame makes the subsequent algorithm upgrade more difficult,and reduces the overall operation and maintenance efficiency of the algorithm.In response to this problem,this paper constructs a face 6DoF pose estimation model,which realizes simultaneous face detection and head pose estimation by regressing the 6-DOF vector of the face.Compared with the existing pipeline algorithm design,the efficiency of the algorithm proposed in this paper is greatly improved.Further,this paper enhances the practicability of the model by lightweight design of the model.In terms of theory,the research results of this paper enrich the supervision form of landmark regression and get rid of the shackles of a single supervision setting.The applicability in deep learning-like training scenarios deepens the understanding of the mechanism of the batch normalization layer.At the same time,in terms of application,the single-model head pose regression method constructed in this paper greatly improves the efficiency of the workflow,and is enlightening for head pose estimation methods in various practical application scenarios.

Keywords/Search Tags:

facial landmark localization, face alignment, head pose estimation, attention mech-anism in computer vision, batch normalization, 6DoF pose estimation

PDF Full Text Request

Related items

1	Research And Application Of Head Pose Tracking Technology Based On Computer Vision
2	The Design And Implementation Of Appearance-based Head Pose-free Gaze Estimation System
3	Research On Head Pose Estimation Method Based On Computer Vision
4	Research On The Optimization Of Head Pose Estimation Based On Interframe Information
5	Real-time Head Pose Estimation Based On Adaptive 3D Face Model
6	Research On Facial Pose Estimation And Landmarks Localization Based On Deep Learning
7	Head Pose Estimation Based On Attention Supervision
8	Face Alignment And Face Pose Estimation Based On Boundary Map And Classification
9	Multi-Pose Facial Landmark Detection Technology Based On CNN
10	Vision Based Human Detection,3D Pose Estimation And Motion Analysis With A Single Camera