Font Size: a A A

Research On Intelligent Diagnosis Method Of Depression Based On Audio Signal

Posted on:2023-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y N XinFull Text:PDF
GTID:2544307088966889Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Depression is a highly prevalent mental illness in modern society,but most depressed patients are not effectively diagnosed and treated.Realizing clinical intelligent diagnosis of depression will help depressed patients’ consultation and treatment.Audio signals are non-invasive,easily accessible and contain rich emotional information,and most depressed patients have low voice and slow audio speed,which are significantly different from normal people in terms of language characteristics and vocalization,so this study chooses to use audio signals as recognition features to explore the intelligent diagnosis and recognition of depressed patients.With a small public data set,this study conducts research on automatic recognition of depression based on acoustic features of audio signals,which mainly addresses the following problems: rhythmic features limited to acoustic features(duration,fundamental frequency,energy,etc.)are usually difficult to see the characteristics of the signal,and the representation capability of the features is insufficient,and recognition by machine learning methods often appears to be underfitting,while spectral-based correlation features(such as MFCC,etc.)combined with deep learning methods can suffer from overfitting problems,and neither approach can significantly improve the accuracy of depression recognition.In this study,we construct a new audio representation so that the acoustic features contain richer information to improve the recognition rate of early diagnosis of depression,and the main work is as follows.(1)On the rhythmological features,long-term acoustic features or short-term acoustic features are often used for classification.Long-term features reflect the average value of acoustic features and cannot reflect the change of audio;shortterm features contain less information and cannot represent the audio signal well,we adopt the method of combining short-term features and long-term features to solve this problem.Firstly,we use decision tree algorithm to filter numerous lowlevel short-term audio features,select the four most important short-term features for feature combination,then count the frequency of the joint occurrence of low,medium and high values of the combined short-term features in a time interval(Δt)to generate long-term acoustic features,use the long-term features as discriminative features,and finally use Random Forest algorithm and e Xtreme Gradient Boosting algorithm for classification.The experimental results show that the adopted fused long-term features improve the sensitivity of F1 scores and non-depressed classes compared to the low-level short-term features,long-term features and the deep learning approach,and are able to classify the degree of depression based on audio segments.(2)Deep learning has better results in classification models,but is often based on a large amount of data,allowing the model to learn the distribution of the data.Audio signals are usually presented as audio spectrograms(linear transform spectrograms,Mel transform spectrograms,and Mel-scale Frequency Cepstral Coefficients,etc.)as inputs to deep learning models.With small training data samples,this study uses a model of self-supervised learning to pre-train on a large unlabeled corpus to enable the used audio data to learn more audio representations by pre-training the model,so as to obtain better classification results with small sample data The model is pre-trained on a large unlabeled corpus to enable the used audio data to learn more audio representations by pretraining the model,thus obtaining better classification with less sample data.In this study,we use Mel linear spectrograms as input to obtain new audio representations by pre-training the model,test the model effect in the downstream task of depression recognition,and analyze the effect of the number of parameters on the model effect and the effect of fine-tuning on the classification effect.The results show that the obtained representations of audio signals can obtain good classification results even when the pre-train model is pre-train by a very small corpus.With the increase in the number of participants and the increase in the number of fine-tuning,the classification effect of depression recognition was improved in all cases.
Keywords/Search Tags:Depression recognition, Audio signal, Feature combination, Random Forest algorithm, Self-supervised learning
PDF Full Text Request
Related items