With the change of society,economy and living standards,also daily life and work style,various chronic diseases result in great harm to human health,in which the prevalence of type II diabetes as a kind of chronic disease has rapidly increased,the mortality rate and its complications are increasing year by year.The population of type II diabetes patients in the world is huge,which is the first in the world of China,but there is still in lack of effective treatment.Therefore,the establishment of an effective classification and prediction model of type II diabetes has great practical significance for identifying high-risk groups,control of its complications,the health of the population and providing guidance for clinical work.By reviewing the research status at home and abroad,which is found that scholars generally use conventional statistical analysis and traditional machine learning algorithm to research diabetes prediction model.Here,we are going to use stacking algorithm to build the diabetes prediction model.The data came from the physical examination center of a community hospital in Hefei,Anhui Province in 2019.The primary variables were age,gender,body mass index,family history of diabetes and other related 31 indicators through analysis of risk factors of diabetes.Through the evaluation of data,a series of preprocessing such as data integration,variable transformation and missing value filling are carried out by flexible configuration,and then the filtering method,correlation method and embedding method are used for feature selection.The available samples of 13068 cases with 23 dimensions are obtained as the data basis of the paper.After that,the test set and training set were divided according to the ratio of 3:7,and the Random forest,Light GBM and XGBoost prediction models were constructed.In view of the imbalance of data,the default loss function of XGBboost is changed to focal loss to increase the weight of minority samples,and the Stacking model is fused based on focal loss-XGBoost.The prediction effect of the five models is evaluated by AUC value,Recall and F1,and the features are sorted to find out the important factors affecting type II diabetes.The results of this paper show that in the prediction of Random Forest,Light GBM and XGBoost,XGBoost performs best with AUC of 0.86212 and F1 of 0.45186,but the Recall is only 0.31072,When the loss function of XGBoost is changed to focal loss,the AUC is 0.86347 and the F1 is 0.51570,Recall is 0.53613,Finally,the Stacking fusion model based on focal loss-XGBoost is further optimized.AUC is 0.86866 and F1 is 0.52407,Recall is 0.54545.Our study can identify people with diabetes mellitus,which allow early treatment of patients with diabetes mellitus,avoid serious health hazards and assist medical worker in making reasonable decisions during the diagnosis of diabetes. |