Font Size: a A A

Effect Of Random Forest-Lasso Logistic Regression Model On Screening Health Risk Factors Of Fatty Liver

Posted on:2021-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2480306473977809Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the screening of health risk factors has become a hot topic in biomedical and biostatistical research.The risk factors for chronic diseases such as fatty liver are numerous and complex,and the traditional factor screening methods have too many variables,resulting in a large amount of computation.Lasso method is to compress some regression coefficients by constructing penalty function,so as to get a more refined model,and can realize both variable selection and parameter estimation.Random forest method is a constituent supervised learning method,which classifies samples by constructing decision trees.One of the characteristics of this method is that it can measure the importance of variables.In this paper,random forest was combined with Lasso Logistic regression model,namely,random forest-lasso Logistic regression model.In order to evaluate the application effect of the random forest-lasso Logistic regression model,the model was used to screen the risk factors of fatty liver and analyze the risk factors of fatty liver.This study mainly did the following work:1.Reviewed the research status and progress of Lasso method at home and abroad,introduced the basic theory and algorithm of Lasso method and Lasso Logistic regression model,introduced the method of combining random forest with Lasso Logistic regression model,and reviewed the selection method of harmonic parameters?of Lasso method and Lasso Logistic regression model.2.In December 2019,3,724 physical examinees were collected from the health management center of the general hospital of the western theater command.After data processing,3,500 valid samples were obtained,with an effective response rate of 93.98%.The ages of the 3,500 health examinees ranged from 20 to 88 years old,including 1,982males(56.63%)and 1,518 females(43.37%),with an average age of 48?10.83 years.Among them,1049(29.97%,95%CI(28.45%,31.49%))were diagnosed with fatty liver.3.To investigate the effect of screening risk factors for fatty liver by Random Forest-Lasso Logistic regression model,optimal sub-regression model of Lasso Logistic regression model and stepwise Logistic regression model.Adopts the method based on the actual research data sampling generate simulated data,discussed under the condition of different sample size and the positive rate,correct selection of four types of regression model factors affecting the average number of or right to eliminate the average number of influencing factors,each case simulation 100 times,simulation experiment results showed that the Random Forest-Lasso Logistic regression model and a Lasso Logistic regression model correct selection factors affecting average number is higher than the sub set of regression model and stepwise Logistic regression model;When the positive rate was 50%,the four regression models had the best effect in screening the influencing factors.When the sample size is more than 10 times the number of independent variables,the change of sample size has little effect on the four regression models.4.A case study on the application of Random Forest-Lasso Logistic regression model to the screening of risk factors for fatty liver health was conducted to investigate the effect of Random Forest-Lasso Logistic regression model on the screening of risk factors for fatty liver health.The 3,500 effective samples were divided into training sets and test sets by means of cross validation.Random Forest-Lasso Logistic regression model was established on the training sets,and then Lasso Logistic regression model,optimal subset regression model and stepwise Logistic regression model were established.The advantages of Random Forest-Lasso Logistic regression model were discussed by comparing four regression models.Respectively evaluate the efficacy of the four regression model fitting and forecast performance,it is found that the Random Forest-Lasso Logistic regression model of the correlation factor determination coefficient R~2and adjusted R~2of the mean value of 0.627and 0.621 respectively,Lasso Logistic regression model,the regression model of sub set and stepwise Logistic regression model determination coefficient R~2and adjusted the determination coefficient R~2average less than Random Forest-Lasso Logistic regression model.The mean values of TPR,F-measure and AUC(area under the working characteristic curve of the subject)in the Random Forest-lasso Logistic regression model were 0.675,0.702 and 0.785,respectively.The mean values of TPR,F-measure and AUC in the Lasso Logistic regression model,optimal subset regression model and stepwise Logistic regression model were all lower than that in the Random Forest-Lasso Logistic regression model.The fitting effect and prediction performance of Random Forest-Lasso Logistic regression model were better than that of Lasso Logistic regression model,optimal subset regression model and stepwise Logistic regression model.5.Through the simulation experiment and the example analysis of fatty liver risk factor screening,it is verified that the Random Forest-Lasso Logistic regression model is a better multi-factor analysis method for screening health risk factors of chronic diseases,which can better explain and analyze the studied dependent variables and has better practical significance.
Keywords/Search Tags:Random Forest-Lasso Logistic regression model, Lasso Logistic regression model, Optimal subset regression model, Stepwise Logistic regression model, Screening of influencing factors, Fatty liver disease
PDF Full Text Request
Related items