Font Size: a A A

Non-invasive Risk Score Of Type 2 Diabetes In People Over 40 Years Old In Changchun Based On Data Mining Method

Posted on:2019-03-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J LiuFull Text:PDF
GTID:1364330572452991Subject:Internal Medicine
Abstract/Summary:PDF Full Text Request
Background: Diabetes is a group of metabolic diseases characterized by chronic hyperglycemia due to absolute and/or relative deficiency of insulin secretion and/or dysfunction,associated with genetic,autoimmune and environmental factors,and chronic concurrency.The disease can lead to serious damage or failure of the organ,kidney,nerve,heart,blood vessels and other organs.According to the International Diabetes Federation(IDC),the number of people with diabetes worldwide reached 366 million in 2011.According to the current incidence rate,it is estimated that by 2030,the number of diabetic patients worldwide will reach 552 million.A variety of clinical studies have shown that the occurrence of diabetes is closely related to lifestyle,and active lifestyle interventions for high-risk groups with diabetes can reduce the incidence of diabetes.Therefore,the use of non-invasive diabetes risk assessment tools for the risk of diabetes risk,early detection of high-risk groups,active intervention,is of great significance to prevent the occurrence and development of such chronic epidemics.This method saves human and financial resources,and at the same time has good compliance and contributes to the improvement of public health education level and the enhancement of health awareness.Objects: A non-invasive type 2 diabetes risk scoring system was established for the study population,and compared with the classic non-invasive type 2 diabetes risk scoring system at home and abroad to evaluate the evaluation performance of the new scoring system;the integrated algorithm in data mining was applied to the new The scoring system is integrated with the existing classical scoring system to clarify the performance of the integrated model.Through model integration and verification,the integration methods forestablishing excellent performance of this model are found,and the methodological basis for such model research is provided.Methods: Based on the stratification of the diabetes status,the raw data was randomly divided into 70%(n=3837)training data and 30%(n=1644)test data.Training data is used to determine the cut-off point for each scoring system by maximizing the sum of sensitivity and specificity,and the test data is used to assess classification performance.The first part adopts the penalty likelihood method of three high-dimensional model variables selected by lasso regression(LASSO),smoothly clipped absolute deviation(SCAD)and minimax concave penalized likelihood(MCP)to automatically select the important non-invasiveness of type 2diabetes.Risk factors.Two logistic regressions were fitted to the training dataset using two sets of selected variables,and regression coefficients and reference values were used to form a simple scoring system.In the second part,based on the establishment of the local population scoring system,two integrated algorithms are applied: Majority voting(Weighted voting,Majority voting with model selection,Weighted voting with model selection)and Stacking:Logistic regression.Stacking: LASSO,Stacking: SCAD,Stacking: MCP,Stacking: Stepwise regression)Integrate the new scoring system with the classic scoring system at home and abroad.Accuracy was assessed by the area under the receiver operating characteristic curve(ROC)for each risk score(AUC).Simultaneous calculation of sensitivity,specificity,positive predictive value(+PV),negative predictive value(-PV),positive likelihood ratio(+LR),negative likelihood ratio(-LR),and Yonden index(sensitivity and specificity)And-1).The P value was determined by Hosmer-Lemeshow test,where the P value(<0.05)indicates that the corresponding model fits well.Results: Of the 5481 participants,66.9% were women,22.7% were diabetes,16% were current smokers,4% were cancer patients,13% had a family history of diabetes,and 12% hadhypertension.Compared with men,women have higher BMI(body mass index),HDL(high-density lipoprotein),LDL(low-density lipoprotein)and cholesterol levels,but the mean values of other variables are lower.Diabetic patients have a higher overall mean(or percentage)over most baseline characteristics than non-diabetic patients.In the first part,three penalty likelihood selectors(LASSO,SCAD,MCP)selected similar variables.Age,waist circumference,high blood pressure,family history of diabetes,myocardial infarction,chronic gastroenteritis and high cholesterol are common variables.Stepwise logistic regression selects more variables than penalty likelihood selectors.Our new scoring system selects six common risk factors for the first four models.Using the more conservative model selection algorithm ISIS to select four variables: age,waist circumference,family history of diabetes,and high cholesterol,we used only these four risk factors to construct another scoring system.The first non-invasive scoring system included age(3 points),waist circumference(5 points),hypertension(2 points),family history of diabetes(3 points),high cholesterol(2 points)and myocardial infarction(3 points).Scores range from 0 to 18 points.The second non-invasive scoring system has the same score distribution on the four variables as the first system,and the score ranges from 0 to 13 points.The best cutoff values for our new scoring system are 8 and 4,respectively.The AUCs and Youden indices are slightly worse than the Chinese Diabetes Risk Score,but better than all other scoring systems.In the second part,comparing the AUC and Youden indices shows that the model is best used after voting,which is better than all the original scoring systems.After the model is selected,the weighted voting algorithm has an AUC of 0.850 and a Youden coefficient of 0.450,which shows that it performs optimally.The results show that the voting method after model selection is the preferred method for integrating risk scoring system.In the stacking approach,model selection has a very small impact on the performance of the meta-learner.Conclusion: The new scoring system from people over 40 in Changchun is better than other scoring systems in assessing the risk of type 2 diabetes in this population or similar to other scoring systems;the integrated method in applied data mining can achieve better performance than all original scores.A new model of the system;in the method of integrating the risk scoring system,the voting method after model selection has the best performance.
Keywords/Search Tags:Type 2 diabetes, risk score system, data integration, Youden index, ROC curves
PDF Full Text Request
Related items