| Facial expressions are generated by the changes of facial muscles,which can reflect the real emotional state of a person.In today’s era of high-speed development of deep learning and artificial intelligence,expression recognition in affective computing,as a branch of human-computer interaction technology,has potential applications in service robots,medical health,driver fatigue detection and other fields,and has received more and more attention in recent years.Since facial expressions have motion changes in temporal order,optical flow can effectively reflect the motion changes in dynamic facial expressions,so the dynamic expression recognition combined with optical flow is of great significance,however,the computational complexity of optical flow is high and needs to consume a lot of computational resources.In this paper,the research of dynamic expression recognition combined with knowledge distillation method is carried out with the objectives of reducing the computation of optical flow and improving the accuracy of expression recognition,and the main work is as follows:(1)In the field of expression recognition,optical flow can effectively reflect the motion changes in dynamic expressions,but the optical flow is computationally intensive.To address this problem this paper introduces the knowledge distillation method,takes the image sequence with optical flow information as the input of the teacher model and the face image sequence as the input of the student model,establishes the output space distillation loss between the output of the teacher model and the student model,and constructs an output-based The dynamic expression recognition model based on spatial knowledge distillation is constructed,and the optical flow information in the teacher model is migrated to the student model,so that the student model can not only extract dynamic expression features from the face image sequence without complicated optical flow calculation,but also learn the optical flow information migrated from the teacher model,which improves the recognition performance of the student model.In addition,in order to better migrate the information from the teacher model to the student model,two forms of image sequences are designed as the input of the teacher model in this paper.The first one: firstly,the squared sum of the vertical optical flow component and the horizontal optical flow component is subjected to the root opening operation to obtain the third optical flow component,and then this optical flow component is synthesized with the vertical optical flow component and the horizontal optical flow component into a three-pass optical flow image;the second one: the vertical optical flow component,the horizontal optical flow component and the grayscale image of the face are synthesized into a motion-enhanced image.When the motion-enhanced image sequence is used as the input of the teacher model,the output spacebased knowledge distillation expression recognition model achieves 63.00% and 94.19%classification accuracy in the RAVDESS and CK+ database test sets,respectively,which is 2.00 percentage points and 1.53 percentage points higher than the baseline model.(2)It is not sufficient to construct distillation loss only between the teacher model and the student model output space;the student model cannot fully learn the information in the teacher model.Given that the intermediate feature layer in the feature space of the teacher model also contains rich knowledge,this paper proposes to build the feature space distillation loss in the intermediate feature layer in the feature space of the teacher model and the student model based on the knowledge distillation expression recognition model in the output space,and constructs a dynamic expression recognition model based on the knowledge distillation in the feature space and the output space,so that the student model can fully learn from the teacher model’s knowledge between low-level features and high-level features.When the motion-enhanced image sequence is used as the input of the teacher model,the classification accuracy of the dynamic expression recognition model based on feature space and output space knowledge distillation reaches 65.33% and 96.94% in the RAVDESS and CK+ database test sets,respectively,which is 2.33 and 2.75 percentage points higher than the output space-based knowledge distillation expression recognition model,respectively.(3)The expression recognition model proposed in this paper was applied to neonatal facial pain expression recognition,in which the dynamic expression recognition model based on knowledge distillation in feature space and output space achieved 67.00% recognition accuracy in this database test set when motion-enhanced image sequences were used as the input of the teacher model. |