| Depression is one of the most common mental disorders and is characterized by low mood,feelings of guilt or low self-worth,sleep or appetite disturbances,feelings of fatigue or lack of concentration.These symptoms are long-lasting and recurrent and can seriously impair an individual’s ability to perform daily tasks,live and learn,and in the most severe cases,suicidal tendencies may occur.In recent years,the number of people suffering from depression has been increasing year by year,adding to the global burden of mental illness.Therefore,it is crucial to improve the early detection and accurate identification of depression worldwide.Depression identification research has become a current research hotspot in the field of affective computing.Research in the field of affective computing on psychological,physiological and cognitive and artificial intelligence technologies has laid a solid theoretical foundation and technical support for depression recognition.In this thesis,we take the specificity of clinically discovered depressed patients in facial,speech,physiological and text data as a premise,and effectively improve the accuracy of depression recognition by effectively fusing multimodal data features of depressed patients and building deep learning models.The main work and innovation points of this thesis are as follows:(1)To address the problem of unreasonable feature selection in existing depression recognition techniques,a multimodal depression recognition method based on attention mechanism and feature layer fusion is proposed.Firstly,we extract the low level descriptors(LLDs)of multimodal data and efficiently fuse the extracted shallow features at the coarse-grained feature level to verify the effectiveness of feature selection for depression recognition;secondly,we use the designed deep learning feature encoder to extract the deep features implied in the shallow features and achieve the effective fusion at the fine-grained feature level.Finally,this thesis uses a multi-level modal mutual attention mechanism to fuse the features at both coarse and fine granularity levels to achieve the goal of predicting depression severity from visual,audio and transcribed text information.In this thesis,we conducted several experimental validations on the multimodal dataset of depression published in The Audio/Visual Emotion Challenge and Workshop(AVEC2017 and AVEC2019),and the proposed multimodal feature layer fusion model based on the attention mechanism(MFF-Att)achieved better results in depression recognition and also validated the rationality of feature selection.The root mean square error(RMSE)and mean average absolute error(MAE)of the proposed model on the validation set are 4.03 and 3.05,respectively,which are better than the performance of the baseline and most current multimodal fusion-based depression recognition methods.(2)In order to solve the problems of insufficient utilization of data feature information and difficulty in effectively extracting features with strong correlation with depression recognition in existing multimodal fusion-based depression recognition methods,a multitask learning depression recognition method based on multiscale fusion features is proposed.First,by constructing CNN_LSTM feature encoders to extract multi-scale features,the shallow features with higher resolution are used to enhance the characterization ability of data,while the deep features with lower resolution are used to improve the characterization ability of semantic information and improve the utilization of multi-source information of depressed patients.By constructing a multitask learning model on a multiscale-multimodal fusion space,the regression task for PHQ score assessment of depressed patients shares parameters with the classification task and obtains a more generalized representation of both tasks.In this thesis,we use multiscale feature fusion and multitask learning techniques to extract deep features with relevance to depression recognition,improve the utilization of data information,and thus improve the performance of the depression recognition model.The root mean square error of the proposed model’s multi-task regression on the validation set of the DAIC-WOZ dataset is 5.28,and its Accuracy and F1 scores for the classification task are 72.72% and 0.66,respectively;the mean absolute error of the multi-task regression on the validation set of the E-DAIC dataset is 3.34,and its Accuracy and F1 for the classification task are 85.71% and 0.73,both of which achieved optimal results in the comparison experiments and outperformed the baseline results in the DAIC-WOZ dataset,demonstrating the superior performance of the proposed depression recognition model. |