According to data released on the official website of the International Diabetes Federation(IDF)in early November 2021,China has about 140 million diabetics,ranking first in the world.If not controlled,diabetes can lead to a series of complications,such as hypertension,cerebrovascular disease,cardiovascular disease,lower extremity vascular lesions,eye disease,kidney disease,neuropathy.Diabetes affects almost every organ in the body,yet 51.7 percent of the 140 million people in the population are undiagnosed and belong to the "hidden" diabetes group.With the improvement of people’s material living standard,people’s demand for health is getting higher and higher."Healthy China" has also been included in the national development strategy and put forward the policy of "prevention first".Therefore,it is necessary to establish a prediction model of diabetes risk.By establishing a mathematical model,the risk of diabetes in the general population can be estimated,and the high-risk groups can be identified and given early warning,so as to facilitate further targeted diagnosis and treatment as soon as possible.On the basis of summarizing previous studies,this paper uses factor analysis to reduce dimension of data,extract labels,and conduct exploratory analysis of data.This paper investigated the medical records of 30 patients in a hospital of Traditional Chinese Medicine in a county in northeast China,conducted text mining for the patient chief complaint information recorded by doctors in the patient admission records,and drew the portrait of diabetes patients through Chinese word segmentation,word frequency statistics,data cleaning and other links.Feature information is extracted.The portrait could describe the common clinical symptoms of diabetes patients.It has good application value for health knowledge popularization and doctor consultation.Through interviews with medical workers in the hospital,six health measures to be taken by patients with diabetes were learned in the survey.According to the medical research shows that the indexes of blood lipids,blood pressure,kidney,urinary protein and diabetes have relevance,through relevant literature research also confirms this,index system of the combination of literature,many scholars and correlation analysis results of physical examination data,established the diabetes risk prediction model based on physical examination data of index system.Four suitable machine learning algorithms were selected based on literature research,experimental data were used to verify and analyze the constructed index system,and the predictive performance of the algorithm was optimized.Machine learning algorithm has better accuracy and generalization ability in dealing with more complex problems.According to the requirements,the 6326 data in the sample set were divided into 7:3 ratios for data verification and analysis.Based on BP artificial neural network,SVM,Decision Tree and Random Forest model,the machine learning simulation was established respectively.The accuracy,accuracy,sensitivity T,F1-score and other indicators were compared and analyzed,and the fitting effect of each model was evaluated from multiple perspectives,and the most suitable model was found to be the random forest model.Finally,the grid search algorithm was used to optimize the parameters of the random forest model to achieve a more balanced performance in terms of performance,memory overhead and accuracy.A risk prediction model suitable for type 2 diabetes mellitus was constructed,and the application scenarios of the model were discussed and prospected. |