Font Size: a A A

Research On Diabetes Prediction Model Based On Ensemble Learning

Posted on:2019-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2404330575950428Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the continuous improvement of living standards,people are paying more and more attention to health problems.Diabetes has become the third major chronic noncommunicable disease,which is a serious threat to human health after cancer and cardiovascular disease.The prevention and treatment of diabetes in China is characterized by "three high" and"three low".The "three high" refers to the high incidence,the high incidence of complications and the high cost of treatment.The "three low"refers to the low awareness rate,the low treatment rate and the low compliance rate.Therefore,to establish an effective model of diabetes prediction is of great significance for controlling the risk of disease and ensuring people's health.The purpose of this paper is to establish a diabetes prediction model based on the microflora data by using ensemble learning methods.On the one hand,it provides the possibility of non-invasive prediction of diabetes,and reduces the time and the money of medical care.On the other hand,it provides a new idea for screening diabetes in a wide range of people and avoids serious consequences because of late discovery.Based on fully summarizing the relevant research both at home and abroad,firstly,the paper introduces the idea of ensemble learning and three commonly used ensemble methods:Bagging,Boosting and Staking.Then the paper uses text mining technology and regular expression to extract the basic information of the sample from electronic medical record,and divides the sample into diabetic or non-diabetic patients according to the current diagnosis.At the same time,the paper uses filtering and integration methods to reduce the dimension of intestinal flora data.After that,RF,GBDT and XGBoost are used to establish predictive models for diabetic and non-diabetic patients.Finally,the paper uses Stacking method to integrate the three models to improve the prediction effect,and uses Recall,FPR and AUC value to compare the four models.The results of the paper show that there are differences in intestinal microflora between diabetic and non-diabetic patients;there are differences in diabetes prediction effect based on RF,GBDT and XGBoost,among which RF and XGBoost are better,the Recall reaches 85%;Stacking method improves the Recall and reduces FPR.Therefore,the results of the paper can help identify diabetic and non-diabetic patients,so that unknown people with diabetes can be treated as soon as possible,reducing the risk of late awareness.
Keywords/Search Tags:Ensemble Learning, Diabetes, Microbiota, Feature Selection, Stacking Method
PDF Full Text Request
Related items