| As people’s work pressure continues to increase,the requirements for mental health recognition are getting higher and higher.Psychological manifestations are diverse,and the traditional mental health evaluation model that relies on experts has certain subjective deviations.Therefore,this paper proposes a recognition model based on facial,voice and gait behavior characteristics to achieve intelligent recognition of mental health status,in order to reduce the influence of subjective factors on the recognition results.This paper uses multi-modal fusion to obtain comprehensive features,thereby improving the richness of the input information of the mental recognition model,and introduces the attention mechanism when feature fusion to give weight to each modal feature to improve the fusion effect.In this paper,low-pass filtering is used to denoise the original facial and gait video signals,while Wiener filtering is used to denoise voice signals,and the normalized facial data set,voice data set(1 and the gait data setare obtained after preprocessing.Next,use the Open Pose algorithm to capture facial key points such as eyes,nose and mouth and gait key points such as elbows and knees of each frame of image in facial dataand gait data(1,so as to obtain the structure of face and gait mode features,get the facial key point set(8and the gait key point set(8.At the same time,the open SMILE algorithm is used to extract the low-level voice description factors such as short-term energy,pitch frequency,formant,and Mel frequency cepstrum coefficient in the voice signal.Since general pattern recognition cannot directly deal with the original features of face and gait key points and voice low-level description factors,and these data cannot be directly used for the establishment of classification models,this article calculates the time-domain statistical parameters of the above-mentioned original features.The facial feature matrix10)(6),the voice feature matrix(110)(6) and the gait feature matrix10)(6) are obtained respectively,so as to extract the information that can be used by the classifier as the feature value.Next,perform dimensionality reduction processing on the above features to remove the feature vectors with less correlation,and use the attention mechanism method to fuse multi-modal features:first,the face,voice and gait features are cascaded and merged to obtain the fused feature,and then introduce the attention mechanism to calculate the attention coefficient of each mode,and then multiply the characteristics of each mode with its corresponding attention coefficient and perform secondary cascade fusion to obtain the final multi-modal fusion feature((6).Finally,the fusion feature((6)is sent to the support vector machine and the classification model is trained.Finally,10 support vector machine classifiers are obtained for the recognition of 10 different mental health conditions such as depression,paranoia,and anxiety.This paper mainly builds a multi-modal fusion mental health recognition model based on the three modal information of face,voice and gait.Aiming at the shortcomings of the single-modal recognition method and the direct cascade fusion method,a multi-modal fusion method introducing the attention mechanism is proposed to construct the mental health model.The datasets in this article was collected by the Occupational Health Institute of Guangdong Electric Power Research Institute on 680 Guangdong Power Grid employees.The collected content includes facial,voice,gait and mental health information.Through comparative experiments on the datasets,it is verified that the mental health model constructed in this paper has higher recognition accuracy and more stable performance. |