Font Size: a A A

Research On Predictive Model Of Type 2 Diabetes Risk Based On Data Mining

Posted on:2020-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:W JiFull Text:PDF
GTID:2404330623956160Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Diabetes is a group of endocrine and metabolic diseases caused by absolute or relative deficiency of insulin in human body.Its main feature is the increase of blood glucose,and it is one of the most important chronic non-communicable diseases in the world.About 425 million people worldwide suffer from diabetes,and more than 90% of them have type 2 diabetes.The number of diabetic patients in China ranks first in the world,and the incidence of diabetes and related complications gradually shows an explosive growth trend,which greatly affects the quality of life of residents and threatens the health care system of the whole society.At present,there is no cure for diabetes.It is urgent to construct a scientific and effective diabetes risk prediction model to assess the risk of diabetes in the general population to detect potential high-risk groups,and then to inform and warn attack of diabetes.In recent years,domestic and foreign to extract valuable information resources from the vast medical data with the help of increasingly mature data mining technology to assist in the diagnostic treatment of related cases and research development in the medical field.This paper based on the data mining technology and summarizing of the previous research results,analyzes the health examination data of residents and constructs the predictive model of type 2 diabetes risk.First,based on real native health checkup reports to preprocess data and construct data sample set for predicting the risk of type 2 diabetes.Collect 4650 authentic original health examination reports of 2325 medical examinees desensitized in a hospital health examination center from 2010 to 2015 for two consecutive years,evaluate data and adopt flexible configuration methods for a series of data preprocessing work such as data integration,data standardization,and qualitative variables and so on,a total of 2064 49-dimensional available samples are obtained,which used as the research data basis.Second,this paper studies and proposes a feature selection method based on the combination of random forest and filtering feature selection,screens out the optimal feature subset of the risk prediction of type 2 diabetes.The random forest is used to evaluate the feature importance of the available samples after data preprocessing.The fitting analysis of multiple cross-validation finds that a total of 28 variables has a visible impact on the corresponding results,then divides them incrementally and further analyzes the 28 feature subsets' represent of the area under the receiver operating characteristic curve of the classifier,finally select the optimal feature subset containing 9 characteristic variables,and keep them as input variables for data mining-based risk prediction model of type 2 diabetes.Third,a predictive model for the risk of type 2 diabetes based on the fusion of logistic regression and extreme gradient boosting(XGBoost)fusion is proposed.Constructs prediction models based on logistic regression and XGBoost respectively,and the fusion prediction model of logistic regression and XGBoost is studied and implemented.The parameters are selected and debugged,and the independent test set is used for test verification.It is shown that the three models constructed have good effects.The fusion model of logistic regression and XGBoost proposed in this paper is the best model.Finally,design and implement a prototype system for predicting the risk of type 2 diabetes.Based on the fusion prediction model of logistic regression and XGBoost,combined with the actual application scenario,the requirements analysis and system design are used to realize the prototype system,which is the basis for further promotion and application.
Keywords/Search Tags:Type 2 diabetes, Feature selection, Risk prediction, Data mining
PDF Full Text Request
Related items