Font Size: a A A

Construction And Evaluation Of Antenatal Depression Risk Prediction Model Based On Random Forest Algorithm

Posted on:2024-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:G Y DengFull Text:PDF
GTID:2544307178951229Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objectives:To understand the detection rate of antenatal depression in rural minority areas of western China,to construct a random forest risk prediction model of antenatal depression and evaluate the model,to explore the important variables of antenatal depression,and to provide a more scientific and convenient assessment tool for identifying the high risk population of antenatal depression.Methods:In May 2022,pregnant women in an agricultural county inhabited by ethnic minorities of Yunnan Province were surveyed and antenatal depression were screened.In the form of electronic questionnaires and supplemented by paper questionnaires,basic information questionnaires were used to collect basic demographic data,social psychology data,health behavior data and maternity data of pregnant women.A score of≥9 on the Edinburgh Postnatal Depression Scale(EPDS)was judged to have depression during pregnancy.R4.1.2 software was used for statistical analysis and risk prediction model construction,and the mean±standard deviation(x±s)and median(interquartile range)[M(P25,P75)]were used to statistically describe the continuous variables with normal distribution and skewed distribution.Frequency(percentage)[n(%)]was used to statistically describe categorical variables;TheX2test or Fisher exact probability method was used to analyze the differences between groups of the occurrence of depression during pregnancy in different characteristic populations,and the difference was statistically significant with P<0.05.After data cleaning and data conversion of outliers,error values,missing values,and duplicate values in the original dataset,the"create Data Partition"function in the"caret"package was used to randomly divide the preprocessed dataset into training and validation sets according to the ratio of 7:3.Based on the training set,two variable selection methods based on random forest importance scores,Sliding Windows Sequential Forward Selection(SWSFS)and Boruta algorithm,and one traditional variable selection method(univariate analysis based on chi-square test),were applied to screen 50 variables related to basic demography,social psychology,health behavior and maternity.Then,variable set 1,variable set 2 and variable set 3 were screened,and based on the above three variable sets,three risk prediction models RF1,RF2 and RF3 based on the random forest algorithm were established by using the"random Forest"function in the"random Forest"package.Based on the validation set,the prediction performance of three pregnancy depression risk prediction models was evaluated,and the area under ROC curve(AUC)and its corresponding 95%confidence interval(CI)were used as the main indicators to evaluate the predictive performance of the model,followed by six indicators based on confusion matrix,such as accuracy,precision,recall,specificity,F-measure,and G-mean,as auxiliary evaluation indicators.Finally,the best risk prediction model and its corresponding optimal variable set were selected,and the important predictors affecting depression during pregnancy were analyzed.Results:1.Basic information of the survey:747 people were surveyed in this study,and732 people were effectively surveyed,with an effective rate of 97.99%.The average age of the surveyed subjects was 26.35±5.48 years old,ranging from 14 to 44 years old;the weeks of pregnancy ranged from 4 to 41 weeks,including 287(39.2%)in early pregnancy,222(30.3%)in middle pregnancy,and 223(30.5%)in late pregnancy.Most of the women were ethnic minorities(58.2%),lived in rural areas(61.2%),had junior high school education(39.6%),were unemployed/unemployed(34.4%),and had a monthly household income between 3000 and 6999(51.9%).2.Detection rate of antenatal depression:The overall detection rate of antenatal depression in rural minority areas in western China was 13.8%,with a higher incidence of antenatal depression among women living in rural areas(16.7%)than those living in towns(9.2%)(X2=8.41,P<0.05).The incidence of antenatal depression was 13.7%among Han Chinese and 13.8%among ethnic minority women,with the highest detection rate of antenatal depression among ethnic minority women(18.4%),followed by Zhuang(14.6%)and other ethnic minorities(8.9%),but no significant difference in the risk of antenatal depression was found between Han Chinese and ethnic minority women.In addition to this,statistically significant differences were found in the incidence of depression among pregnant women with different ages,spouse’s age,maternal self-rated health status,whether they had financial concerns about having a child,whether their families had gender expectations for the child,whether they had a history of previous adverse emotions,whether they had a history of previous psychiatric disorders,whether they had pregnancy anxiety,and different levels of social support(P<0.05).3.Variable selection based on the training set and the construction of a prediction model for the risk of depression during pregnancy:(1)Variable set 1:7 variables were screened by the SWSFS method,including pregnancy anxiety,previous history of bad mood,family’s expectation of the child’s gender,age,spouse’s age,pregnancy frequency,and social support level;(2)Variable set 2:12 variables screened by Boruta’s algorithm,which includes,in addition to the 7 variables included in variable set 1,5 variables including financial worries about having a child,number of births,previous pregnancy history,previous delivery history,number of children born;(3)Variable set 3:19 were variables screened by one-way analysis of variables based on chi-square test,which includes,in addition to the 7 variables included in variable set 1,financial worries about having a child,previous pregnancy history,previous delivery history,place of residence,experienced within the last 1 year negative events,couple’s relationship,maternal self-rated health status,maternal expectation of child’s gender,maternal evaluation of fetal health status,sleep status,time per exercise,and poor lifestyle habits,and 12 other variables.4.Model performance evaluation based on the validation set:model RF1 had an AUC of 0.873(95 CI%:0.806-0.941),accuracy of 0.873,precision of 0.727,recall of0.258,specificity of 0.983,F-measure of 0.381,and G-mean of 0.504;model RF2 had an AUC of 0.868(95 CI%:0.794-0.933),accuracy of 0.873,precision of 0.800,recall of 0.129,specificity of 0.994,F-measure of 0.222,and G-mean of 0.358;model RF3had an AUC of 0.869(95 CI%:0.805-0.933),The AUC of model RF3 was 0.869(95CI%:0.805-0.933),the accuracy was 0.852,the precision was 0.571,the recall was0.129,the specificity was 0.982,the F-measure was 0.211,and the G-mean was 0.356.Among the three risk prediction models,the highest AUC was for model RF1,followed by the four indexes of accuracy,recall,F-measure,and G-mean for model RF1,which were also accuracy,recall,F-measure,and G-mean of model RF1 were all higher than the other two models.Considered together,model RF1 had the best performance.Conclusions:The detection rate of depression during pregnancy is at a high level in rural minority areas of western China,and routine screening for depression during pregnancy should be strengthened for early detection of people at high risk of depression during pregnancy.The SWSFS variable selection method retained more reasonable variables and has better model performance than the Boruta algorithm and the chi-square test-based univariate analysis method.Pregnancy anxiety,previous history of dysphoria,family gender expectations for the child,age,spouse’s age,pregnancy frequency and level of social support are significant predictors of antenatal Depression.
Keywords/Search Tags:Machine Learning, Random Forest Algorithm, Detection Rate, Antenatal Depression, Risk Prediction Model
PDF Full Text Request
Related items