| With the rapid developments in computer vision and advances in face recognition,contriving machine learning models to estimate abnormal behaviors from faces has attracted lots of attention in the field of automatic face analysis.Such models have paramount potential values for medical diagnosis and treatment.This dissertation focuses on automatic pain estimation from facial videos,where pain is expressed by a set of deformations of facial muscle movements.Such movements can be easily captured by cameras.This has the advantages of convenience and contactless.Such a contactless automatic pain assessment system has great potential for diagnosing diseases,accurate treatment,monitoring medical progress,alleviating discomfort,and improving living quality,etc.It would be especially beneficial for targeted medical groups.For example,newborns,patients in unconscious states,or patients with verbal or mental impairment,where they are not able to communicate their pain.With the widespread of deep learning,it has gradually shown its power in pain estimation.However,there is rich information in the video sequences,which has not yet been fully utilized.This dissertation will dig deeper to uncover cues related to pain,from the following aspects:This dissertation proposes two fusion approaches and introduces the spatiotemporal counterparts of local binary patter,local phase quantization,local binarized image features.By analyzing and combining different spatiotemporal local descriptors,the performances on pain recognition are improved.In this dissertation,two types of fusion approaches are proposed based on second-order average pooling and weighted fusion.To obtain a more comprehensive facial representation for pain,a second-order pooling method is first proposed and applied on different local descriptors.Furthermore,an effective fusion approach is designed to unite low-level local descriptors and high-level deep features.Second-order pooling can capture correlations among different descriptors since it reveals the most pain-related facts.By combining the second order pooled local descriptors with the powerful deep representations,the performance of pain estimation is further improved.This dissertation proposes an end-to-end pain estimation network that utilizes physiological cues in a non-contact manner.By introducing the objective physiological cues in a non-contact manner,the subjectiveness issue is alleviated.This dissertation first proposes a 3D convolutional neural network to recover a physiological signal from video sequences in a contactless manner.Then a visual feature enrichment module is further proposed to fuse the physiological cues and facial representation.In such a way,the physiological cue can be utilized more effectively to guide facial representation learning.Besides,this dissertation designs a spatiotemporal attention network to capture both local and long-range dependencies.In this dissertation,pain assessment problem is tackled from the perspective of combining different information.Such information includes different types of visual information and different source of information(visual and physiological).By designing three feature fusion approaches,the performances on pain recognition,pain classification and continuous pain assessment obtain considerable improvements. |