| Facial micro-expressions are transient and low-intensity muscle movements that appear involuntarily when people conceal,disguise,or hide their true emotional states.It is independent of human will but reflects their authentic feelings and motivations.Therefore,it is extensively valuable in fields such as public security,clinical diagnosis,and criminal investigation.When micro-expressions occur,facial images are the coupling of individual identity attributes and emotional changes feature.The low-intensity of muscle movement in micro-expressions highlights the interference of individual identity attributes and causes subtle changes in only some areas of a face image.Therefore,it is a challenging research topic in microexpression recognition that decreasing the interference of individual identity attributes and simultaneously locating the local regions of micro-expressions and learning the detailed embedding of those regions.In order to solve the problems in previous research,this study proposes a local emotion perception model for facial micro-expressions recognition that explores the potential connections between identity and emotion,globality and locality,entirety and detail in face images.The main research contents of this dissertation are as follows:(1)A two-stream difference network is proposed for coupled identity and emotion solutions.For the problem of individual identity attribute interference with prominent low-intensity embedding of facial micro-expressions images.this research proposes a decoupling model of identity and emotion based on the twostream difference network(TSDN).The autoencoder extracts high-dimensional embedding representations of all face images of micro-expression video samples.The identity stream network extracts the individual identity attributes from the onset frame of the micro-expression video.The same network is used to construct an emotional stream to extract the fusion features of identity attributes and emotional changes from apex frame.The multi-scale difference fusion network utilizes the feature differences of the middle layer of the identity and emotional stream to decouple the emotion embedding hidden in the apex frame for micro-expression recognition.The experiments on the public dataset show that the TSDN model decouples the identity attributes and emotion embedding for improves recognition performance.(2)A recurrent generative attention network is constructed for located regions of interest.For the local relevance to face images when micro-expressions,this dissertation proposed a recurrent generative attention network(RGAN),which recursively learns discriminative regions of interest(ROIs)and local features in a mutually enhanced manner.The apex frame is mapped to the onset frame image through the generative adversarial network and extracted the global emotional feature representation.The attention localization network obtains accurate facial ROIs from the facial image and enlarges them to extract local discriminative features.Finally,the multi-scale local and global loss functions are combined to jointly optimize the model.The experimental results show that the ROI localization model based on the RGAN model can obtain robust local feature representation.(3)A dynamic vision transformer is proposed for local detail feature enhancement.For the problem of subtle emotional changes caused by the lowintensity,a local detail embedding enhancement model based on a dynamic vision transformer(DViT)architecture is proposed for micro-expressions recognition via fusing facial action units and local image features.All frames in micro-expression video sequences are pre-trained by a learnable mask autoencoder network,which extracts local latent emotional features with high attention weight in facial microexpression images.The dynamic visual transformer encoder model utilizes action unit features,which are dynamically mapped to image local features for enhancement and solves the information bias and redundancy brought by the local region localization model.The experimental results show that the dynamic visual transformer model can enhance the local detail feature representation of facial images through additional information,to improve the micro-expressions recognition performance. |