Font Size: a A A

Research On Depression Recognition Based On Spontaneous Speech

Posted on:2024-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2544307058482004Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Depression is a common and serious mental disease that affects both physically and mentally,and patients are more likely to self-harm or even suicide with the disease condition worsens.According to the World Health Organization 2021 survey,about 3.8% of the world’s population was affected by depression.The development of depression in China is particularly serious,according to the National Depression Blue Book 2022,more than 100 million people in China suffer from depression,and the incidence of depression exceeds that of most countries in the world.In addition,the global spread of the COVID-19 has also led to a rapid increase in the number of depressed people,with the trend of a younger age group.Therefore,it is of great importance to the whole society that identify depression at an early stage and intervene to avoid deterioration of the disease.The early detection and diagnosis of depression has attracted wide attention,and a series of research results have been achieved.In particular,due to the non-invasive,simple and fast characteristics and easy data collection,automatic diagnosis of depression combined with speech analysis has become a current research topic.Based on objective and quantitative speech analysis methods,robust and accurate automatic depression recognition methods can help doctors better grasp the condition of patients,so as to develop corresponding treatment plans and ultimately reduce the harm of depression to the whole society.However,current research on depression recognition by speech commonly adopts higher complexity models to achieve better classification performance,but ignores the interpretability of the models,and the inadequate interpretability reduces the credibility and reliability of the models,limiting the further development and application of machine learning in realistic tasks.Besides,the traditional approach to depression recognition requires a large amount of knowledge and experience for feature engineering to select and transform speech features,cannot automatically characterize depression-related cues in speech,and remains challenging in terms of accuracy and robustness of depression recognition.This thesis takes spontaneous speech in real scenes and addresses the above problems in automatic depression diagnosis research,and the contributions and innovations of this thesis mainly are as follows:(1)To address the problem of insufficient model interpretability in current depression research,this thesis proposes an interpretable multimodal depression recognition(IMDD)method for speech and text,which can explain the reasons for model decisions while achieving high recognition accuracy.The IMDD in this thesis improves the ability to discriminate depressed speech by modality fusion and model integration on the one hand,and explores the reasons for model decisions by combining a game theory-based model interpretation approach on the other hand.In this thesis,a multimodal machine learning model is constructed in the clinical speech dataset,and the complementary information between different models and modalities is fused to achieve higher classification performance,and finally F1 value of 0.897 is achieved.Based on the multimodal integrated classification model,the SHAP model interpretation method is applied to explore how the input speech features affect the prediction results of the model at both global and individual levels.It is found that lexical features in the spontaneous speech of depressed patients play a greater role in model decision making than acoustic features,and the model decision process is visualized with two subjects to understand the model decision reasons on an individual level.(2)To address the problem of low performance of traditional machine learning methods in speech modality classification,this thesis proposes a speech depression detection based on Time Delay Neural Networks method(SDD-TDNN)to effectively improve the ability of detecting depressed speech in unimodal mode.In this thesis,we propose SDD-TDNN,an efficient depression detection method based on time-delayed neural networks emphasizing channel attention,propagation and aggregation,transforming the speech signal into Mel frequency cepstral coefficients after sliding cut into segments as the input of the model,and adopting a data augmentation strategy based on spectral masking to improve the classification performance,and the accuracy and F1 values of the used method are 90.4% and 90.8%,respectively.Compared with the baseline system and traditional machine learning methods,SDD-TDNN improves the classification performance.(3)Develop a prototype system for speech depression detection,and implement the theoretical research results in engineering.Based on the research results of this thesis,an interpretable speech depression recognition prototype system is developed.The system adopts a front-and back-end separated system architecture,and the functions of each part are flexibly modularized to facilitate validation and testing in real scenarios,so as to better apply the theoretical research to clinical practice.
Keywords/Search Tags:depression, speech analysis, model interpretability, time delay neural networks
PDF Full Text Request
Related items