Font Size: a A A

The Research Of Intelligent Identification And Verification With Depression Based On Vocal Acoustic Features

Posted on:2024-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J LiangFull Text:PDF
GTID:1524307295981749Subject:Mental Illness and Mental Health
Abstract/Summary:PDF Full Text Request
Objective: The prevalence of depression is about 15%-18%;The World Health Organization(WHO)has stated that by 2030 the burden of disease caused by depression will be the largest global burden of disease.However,there is also some dilemma of surge in depression and insufficient resources for psychiatry in China.The assessment methods for depression are mainly based on the relevant psycho-behavioral scale and the clinician’s subjective assessment based on symptom,and lack objective biomarker features.Depression is dominated by symptoms such as black mood,slow thinking,loss of interest,etc.The vocal acoustic features of depression show obvious monotony,dullness,language deficiency and other characteristics.The special vocal acoustic features associated with depression include lower intonation,slower speech speed,lower volume,and less pronunciation.Therefore,with its convenient,non-invasive,and emotionally rich characteristics,speech has become a new topic for the objective identification of depression,providing a new perspective for the efficient and convenient early diagnosis and evaluation of depression.However,due to the limitations of experimental design and methodology,previous researches of classification model on depression showed low accuracy,lack of generalization,and lack of prediction model and model verification.Meanwhile,the relationship between speech acoustic features and depressive symptoms is also controversial.This research explored the key vocal acoustic features related to depressive symptoms,constructed classification prediction model of depression based on the key vocal acoustic features,performed validation generalization of the the classified prediction model based on the key vocal acoustic features and the effectiveness of the key voice acoustic features in other independent depressive samples.Previous researches have classified major depression and healthy control groups based on vocal acoustic features,but the area of AUC is not satisfactory,the classification accuracy needs to be improved and the depressive symptom severity prediction model based on vocal acoustic features has not been reported.The first part of this study tried to explore the relationship between key abnormal vocal acoustic features and the severity of depressive symptoms,and uses deep learning methods to construct classification and prediction models for major depression groups and healthy control groups.Due to the lack of model verification,the generalization ability and application scope of previous classification models based on vocal acoustic features are limited.The second part of the study recruited participants of depressive mood group,to determine the effectiveness of key vocal acoustic features in independent depression samples and perform the generalization of the depressive classification prediction model based on vocal acoustic features.Previous studies suggest that vocal acoustic features are objective biological markers of drug therapy,and the third part of the researches explored significant changes in vocal acoustic features associated with depressive symptom severity before and after CCBT intervention,according to significant reduction in depressive symptom severity before and after CCBT.This study provide further evidence for the generalization performance of the depressive classification prediction model based on vocal acoustic features.Methods:Part Ⅰ: 122 subjects aged 16-25 participated in this study,included 68 MDD group and 70 HC group.Psychiatrists make diagnoses according to standard of the diagnostic and Statistical Manual of mental disorders-V Version(DSM-V)and Structured Clinical Interview for DSM-V Axis I Disorders(SCID-I),the consistency test was performed by two psychiatrists of associate professor or above.Two psychiatrists utilized psychiatric Structured Clinical Interview for DSM-V(SCID)to evaluate whether patients met the diagnosis of depression attack,and used Structured Clinical Interview for DSM-V(SCID)of non patient version to evaluate the situation of healthy subjects.At the same time,HAMD-17 was utilized to evaluate the severity of depressive symptoms,and the voice data set was collected by reading neutral essays.Data processing and analysis: All voice data are collected with the unified recording pen,and all voice data files are in WAV format.Firstly,we utilized the Covarep of open source algorithm to extract a total of 1200 high-level statistical functions for each sample,included 120 vocal acoustic features,and calculated 10 statistics of the above features(including maximum,minimum,median,mean,variance,kurt,skew,regression slope,regression intercept,and regression coefficient).Then,Python is utilized for correlation analysis,and neural network is utilized to establish the model to distinguish whether depression and predict the total score of depression,and evaluated the effectiveness of the classification and prediction model.Part Ⅱ:A total of 70 participants aged 16-25 were collected in this study,including 34 of depressive symptom group and 36 of healthy control group,and all participants were eligible for admission criteria and exclusion criteria.HAMD-17 was used to assess the severity of depressive symptoms and to have participants read neutral essays to collect a speech data set.Data processing and analysis: The same study performed feature extraction and feature selection,followed by Python for data analysis,and used two parts to screen out 63 speech acoustic features that were significantly related and repeated to depressive symptoms,and verified the generalization performance of the depression classification and prediction model of speech acoustic features.Part III: This study continued to complete the CCBT follow-up study with 18 participants with depressed mood in the second part of the study,aged 18 to 23 years,with an average age of 20.38.HAMD-17 was used to assess the severity of depressive symptoms.Data processing and analysis: Feature extraction was performed in the same study,and python was used for data analysis,first analyzing the symptom difference test before and after the intervention,and then using the vocal acoustic features of the previous study for Wilcoxon difference test.Results:Part Ⅰ: There was no significant difference in age and sex between the major depression group and the healthy control group,and there was a statistical difference in clinical symptoms(P<0.01).Mel-cepstral efficient associated with first-and second-order differentials(P<0.01),channel filtering features associated with formant(P<0.01),volume-related prosodic features and vocal cord vocal source features were significantly associated with the severity of depressive symptoms(P<0.01).The classification modeling of the major depression group and the healthy control group using relevant and significant speech acoustic features was 0.90,and the ROC analysis results showed that the classification accuracy was 84.16%,the sensitivity was 95.38%,and the specificity was 70.90%.The depression prediction model of speech characteristics showed that the predicted score was closely related to the total score of HAMD-17(r=0.687,P<0.01);Mean Absolute Error(MAE)between the model’s predicted score and the total score of HAMD-17 is 4.51.The ROC analysis results of the key speech acoustic features of the top 30 weights on the depressed patient group and the healthy control group were an average AUC of 0.87;the confusion matrix showed that the classification accuracy was 81.66%,the sensitivity was 93.84%,and the specificity was 62.27%.The predictive model showed that the predicted score was closely related to the total score of HAMD-17(r=0.597,P<0.01),and the MODEL predicted score had a MAE of 4.69 for the true score of HAMD-17.Part Ⅱ: The model verification results of independent samples showed that two partially repeated acoustic features of speech could also be effectively classified in the depressed mood group and the healthy control group,mainly including Mel inverted spectrum and first-and second-order differentially correlated spectral features.Channel filtering characteristics associated with formants;Volume-related prosody characteristics as well as vocal cord source characteristics.Based on the acoustic features of repeated speech in two partial samples,we found that 42 repeats of the acoustic features of speech were consistent in the trend: the average,median and minimum of Meier’s inverted spectrum were significantly positively correlated with the total depression score.The standard deviation,mean,and standard deviation of meier’s inverted first-and second-order differences were significantly negatively correlated with the total depression score and significantly positively correlated with the peak;The mean,median,and intercept of the formant were significantly and inversely correlated with the overall depression score.The results of the classification model based on the acoustic features of repeated speech in two partial samples showed that the mean AUC=0.88,and the confusion matrix showed that the classification accuracy(Accuracy,ACC)was 89.23%,the sensitivity(Sensitivity,SEN)was 90.32%,and the specificity(SPE)was 88.23%.;The prediction model verification results show that the predicted score is related to the total score of HAMD-17(r=0.748,P<0.01),and the MAE between the model predicted score and the total score of HAMD-17 is 3.03.Part Ⅲ: The difference between HAMD-17 before and after CCBT intervention significantly indicated that the depressive symptoms of the depressed participants after the CCBT intervention were significantly relieved.Before and after the CCBT intervention,18 of the acoustic features of speech that were partially repeated before and after the CCBT intervention were found to have significant differences before and after the CCBT intervention(P<0.05).The 18 key vocal acoustic features showed differences before and after the intervention,which mainly included the significant increase and more variation of Mel-cepstral Efficient Phase and Mel Frequency Cepstral Coefficients deltas(P<0.05),the significant decrease and larger intercept of the formants(P<0.01)and the increase in the features of the vocal cords after intervention(P<0.05).Conclusion: Speech acoustic features significantly related to the severity of depressive symptoms,including vocal source characteristics,spectral characteristics,prosody characteristics,and channel filtering characteristics.The results of the classification model showed that the acoustic characteristics of speech significantly related to the severity of depressive symptoms could effectively classify the major depression group and the healthy control group.It can also effectively predict the severity of depressive symptoms.The classification model of major depression group and healthy control group based on key abnormal speech acoustic features was good in accuracy,sensitivity and specificity;The depressive symptom severity prediction model was better than previous studies,with less error in prediction.Based on the first part of the study,this study utilized the repeated key vocal acoustic feature classification and prediction model of the two parts to verify that the accuracy,sensitivity and specificity of the model with vocal acoustic features on independent samples are good,and the results found that the accuracy,sensitivity and specificity of the vocal acoustic features were satisfied for the classification model of independent samples,and the prediction error of the prediction model is small.Vocal acoustic features can not only effectively classify and predict the major depression group and the healthy control group,but also the depressed mood group and the healthy control group,that is,the classification and prediction model of the four-layer fully connected neural network model based on the speech acoustic features are generalized.As depressive symptoms improved significantly before and after CCBT interventions,the key vocal acoustic features of the Mel’s reciprocal coefficients,the peaks of formant,and the features of vocal cords and sound sources associated with depression changed significantly before and after CCBT interventions.The findings suggested that speech acoustic signatures may be also important objective biomarkers for the evaluation of the therapeutic efficacy of CCBT.Therefore,relevant key vocal acoustic features can not only effectively classify and predict major depression and healthy control groups,depressive mood groups and healthy control groups,but also have important potential value in evaluating the efficacy of CCBT treatment.
Keywords/Search Tags:depression, vocal acoustic features, deep learning, classification model, prediction model, model validation
PDF Full Text Request
Related items