Depression is a prevalent mental illness that not only inflicts physical and psychological harm on patients,but also imposes a significant burden on families.Without timely intervention,more severe outcomes like self-harm or suicide can arise.However,diagnosing depression requires medical professionals and consumes a considerable amount of medical resources,resulting in many individuals not receiving timely treatment.The introduction of artificial intelligence(AI)technology to screen for behavioral markers of depression could alleviate the shortage of medical resources.Compared to video mode,it is easier to collect reliable data through more flexible and secure voice and text recording methods.This paper explores intelligent recognition research on depression based on voice and text to address the limitations of existing literature.However,there are several issues with the current research on the intelligent identification of depression.These include a lack of Chinese data sets and related studies,insufficient multi-modal data fusion,and inadequate practical evaluation of experimental paradigms.To address these problems,this paper focuses on the following research objectives:(1)The paper utilizes the MODMA dataset,the only public data set that uses Chinese interviews.The speech data is transcribed into text,and feature extraction is performed using the pre-trained Chinese BERT-wwm model,which incorporates the whole word mask(wwm)technology,better aligned with Chinese expression habits.(2)The paper explores a fusion method based on the attention mechanism,which separately fuses global and local variables of speech and text.The text features of integrated speech global information yields the best recognition effect,with an accuracy rate of 0.80,an accuracy rate of 0.71 for the depression group,a recall rate of 0.72,and an F1 score of 0.71.The fusion method takes into account the characteristics and practical significance of each mode,enhancing the effectiveness of multiple modes.(3)The paper conducts a similarity analysis of each question from the MODMA dataset from the perspective of characteristics.It is observed that the healthy and depressed groups displayed high similarity in questions about physical and mental health and picture descriptions,while the healthy group also demonstrated high similarity in questions about handling interpersonal conflicts and negative emotions.In contrast,the depressed group lacked this pattern,suggesting a focus on problems with interpersonal conflict and negative emotions and streamlining problems with physical and mental health and picture descriptions.Additionally,the dataset is tested using two classification methods of interview style and emotional titer,revealing a focus on positive and neutral questions. |