| Depression is a common mental disease,severe depression even leads to self-mutilation or suicide.And the occurrence of suicide has begun to appear younger trend,seriously affecting individuals,families and society.Clinical evidence shows that early detection of depression with effective psychological intervention and drug treatment can alleviate or even cure patients with depression.However,the etiology of depression is complex and diverse,which leads to difficult clinical diagnosis because of high misdiagnosis and missed diagnosis rate.Clinical findings show that patients with depression have certain specificity in voice,text,expression,physiological and other modal data.According to the multimodal data of depression,the establishment of computer-aided diagnosis model can effectively improve the efficiency and accuracy of depression diagnosis.The existing depression diagnosis methods have some problems,such as unreasonable feature selection,insufficient multimodal data fusion,scarce depression data sets and harsh collection conditions,which lead to the low accuracy of depression diagnosis model.Therefore,based on machine learning method,this paper studies the multimodal data of depression in adolescents,constructs a depression diagnosis model,and help to improve the accuracy and efficiency of clinical diagnosis of depression.This research mainly includes the following research work and innovation:1.Extract local and global audio and text features.In the aspect of audio,an improved AlexNet model(A-AlexNet)is proposed to extract the audio features of the spectrogram which contain local information,and the time-frequency domain analysis method is used to extract the audio features of HSFs which contain global information.In the aspect of text,the pre training model of Bert-wwm-ext is introduced to extract the text features which contain local and global information.2.Propose a diagnosis model of adolescent depression based on coarse-grained and finegrained fusion.At the coarse-grained fusion level,the model integrates features at the sentence level,focusing on the overall depression tendency.At the fine-grained fusion level,a new attention mechanism is proposed for automatic alignment of speech and text at the lexical level,which maximizes the relevance of the original information and focuses on the fine-grained information complementarity of multimodality.Finally,combining the advantages and disadvantages of the two models,a decision level fusion model based on coarse-grained and fine-grained fusion is proposed to improve the accuracy of depression diagnosis.3.From two aspects to reduce the negative impact of data acquisition due to limited conditions.Firstly,the algorithm of state impact elimination based on audio similarity is proposed to solve the problem of outliers in part of the data caused by the state changes of the subjects during the interview.Secondly,a MulMixup method for multimodal data enhancement is proposed,which reduces the impact of data set scarcity caused by collection difficulties,enriches the feature space of original data,improves the accuracy and generalization ability of the model. |