| Micro-expression is a kind of spontaneous facial expression.Compared with macro-expression,micro-expression,which can reveal people’s real feelings,has the characteristics of shorter duration,lower intensity and smaller action area.It has a wide range of applications in the field of criminal investigation,prison management,emotional behavior analysis and public security etc.The development of micro-expressions is inseparable from the support of micro-expression datasets.However,the number of publicly available micro-expression datasets is small and the quality is uneven.As one of the beneficiaries of micro-expression research,prisoners,as a special group,have also benefited a lot from the correction and reformation.At present,most of the education and reform of prisoners are carried out in the form of traditional questionnaires and interviews,but its difficult to tap the potential behavior intention of prisoners in this way.As a kind of spontaneous and non-contact information,facial micro-expression can describe the real psychological activities of prisoners,and is of great significance to the correction and transformation of prisoners.However,there are currently no public micro-expression datasets with prisoners serving as subjects,nor are there micro-expression datasets collected in a non-laboratory environment with a large number of samples.Convolutional neural network models have been dominant in micro-expression recognition tasks.It focuses on extracting the texture features of facial action units,but ignores the long-distance dependencies of different facial action units,resulting in poor utilization of the spatial relationship of each action unit in micro-expressions.Sufficient,the recognition effect is poor.In response to the above problems,first,this paper established the SDU micro-expression dataset,which contains 1430 samples of 123 subjects,covering six emotional categories of anger,disgust,fear,happiness,sadness and surprise.The SDU-A subset with university teachers and students as subjects contains 855 samples of 73 subjects,and the SDU-B subset with prison personnel as subjects contains 575 samples of 50 subjects.The SDU dataset provides a high-quality dataset for the related research of micro-expression,and provides help for the transformation of inmates.Second,as classic feature extraction algorithms,LBP-TOP,MDMO,CNN,and Vision Transformer are used to extract the micro-expression features of SDU dataset.Performance evaluations are conducted on two subsets of the SDU dataset with related models and provide basic theoretical support for the subsequent application of SDU micro-expressions.Finally,to solve the problem of traditional convolutional neural networks ignoring long-distance dependencies between different facial action units,Multi-Path Vision Transformer with Divided Space-Time Self-Attention is proposed.By establishing a multi-scale and multi-path model,local features of facial micro-expression action units are extracted through a convolutional network,and the long-distance relationship of different facial action units is learnt by using the Divided Space-Time Self-Attention of Vision Transformer.Coarse-grained features and fine-grained features are extracted on the same feature level.The recognition results of the network are improved.Experiments are carried out on four datasets.The main work and contributions of this paper include the following aspects:(1)The SDU micro-expression dataset is established,which consists of 1430 samples from 123 subjects,covering anger,disgust,fear,happiness,sadness and surprise.The SDU dataset contains two subsets,SDU-A and SDU-B.SDU-A consists of 855 samples from 73 subjects,and the subjects are from college teachers and students.SDU-B consists of 575 samples from 50 subjects,and the subjects are prisoners from Ningxia Province.Firstly,under the guidance of psychological experts,we use Point Gray industrial high-speed camera to record college teachers and students watching the induction video(SDU-A)and the routine business conversations between policemen and prisoners in the prison conversation room(SDU-B).Secondly,the original videos are screened and processed to obtain video clips containing emotional changes,after which the micro-expression video clips are selected out.The samples are labeled by the author and three co-workers jointly to avoid the interference of personal subjective factors,so as to ensure the quality of micro-expression samples.Finally,the standard micro-expression samples are collected,and a micro-expression dataset containing 1430 samples of 123 subjects is established.This dataset covers six emotional categories,including anger,disgust,fear,happiness,sadness and surprise.Compared with existing publicly released micro-expression datasets,SDU dataset is currently the only micro-expression dataset worldwide that uses prisoners as subjects.The micro-expression dataset has a large sample size,high resolution,and a balanced sample size for each class of micro-expression.The establishment of this dataset provides not only a high-quality experimental dataset for the research on micro-expression,but also significant assistance for the emotional analysis and correction of prisoners.(2)The performance on the SDU micro-expression dataset is evaluated.Four classic feature extraction algorithms,including LBP-TOP,MDMO,CNN and Vision Transformer,are introduced and the related models are used to test the recognition accuracy of two subsets of the SDU dataset with different frame rate and different resolution.The impact of frame rate and resolution on SDU micro-expression dataset is studied according to the recognition accuracy.It provides preliminary theoretical support for the subsequent application of the SDU micro-expression dataset.(3)Multi-Path Vision Transformer with Divided Space-Time Self-Attention is proposed.First,by using the motion amplification algorithm based on deep learning,a magnified frame is obtained by amplifying the features of the apex frame of the micro-expression.The optical flow features between the onset frame and the magnified frame as well as the magnified frame and the offset frame are calculated to amplify the fine-grained features.Secondly,the image blocks are encoded through a multi-scale and multi-path network structure,and the local features and long-distance dependencies of facial action units are learned by combining convolutional network and Transformer.Transformer encoder based on Divided Space-Time Self-Attention is used to make judgments on the importance of each space-time image block.It improves the semantic interpretation ability of the network for each space-time image block,and then realizes the extraction of coarse-grained features and fine-grained features at the same feature level.Finally,experiments and analysis on SDU-A,SDU-B,MMEW and S AMM micro-expression datasets prove the effectiveness of the algorithm. |