| China is currently facing a severe diabetes problem.The number of diabetic patients in China ranks first in the world,with an increasing incidence and a trend towards younger age.The diagnostic rate,treatment rate and treatment compliance rate of diabetes are at a low level,and the mortality rate of complications,especially cardiovascular diseases,is higher.More importantly,most patients had relatively mild pre-onset symptoms,more than half of them had no pre-onset symptoms,and many patients found that they had diabetes during health examination due to chronic complications and concomitant diseases.Since the 21 st century,with the continuous development of information Internet technology,the combination of traditional medicine and the Internet has become increasingly close.Data mining and machine learning algorithms are gradually used by experts and scholars to predict the possibility of some diseases based on the existing medical data.Therefore,it is of great significance to predict the risk of diabetes by mining useful information from a large number of medical data through effective data mining methods.Based on machine learning algorithm and diabetes prediction model,this paper carries out data mining on multidimensional medical detection data,and selects a combination prediction model with better prediction effect and stronger practicability by comprehensively comparing the prediction results of each model.Firstly,this paper selects the open source diabetes dataset of Tianchi Medical Health Contest as the data analysis sample,and a large number of characteristic variables in this dataset are of great research value.Secondly,some characteristic variables with serious missing are deleted in data preprocessing and the data with less missing are filled with the mean of the column.In order to improve the prediction effect of the model,in the selection of data feature variables,this paper uses the random forest algorithm to output the importance of each feature and combines the correlation between feature variables to select the appropriate feature subset.Finally,based on the characteristics that the ensemble learning algorithm in the current prediction algorithm can not only ensure the diversity of weak classifiers but also better than the single learner in the prediction effect,this paper uses the stacking ensemble algorithm to integrate a variety of machine learning algorithms such as logistic regression and support vector machine to improve the prediction effect of the model.The diabetes prediction model with better prediction effect was explored by trying to integrate different algorithm combinations.The good prediction effect of the combined model is helpful to assist doctors in screening high-risk groups of diabetes,so that timely detection and early treatment can be achieved.It also helps to reduce the risk of patients missing the best early diagnosis and treatment period,reduce the treatment cost of patients and reduce the economic burden. |