| In recent years,with the development and promotion of machine learning,its advantages in the field of big data processing have been fully utilized,and many medical experts have tried to use machine learning modeling methods to provide intelligent assistance for medical diagnosis and treatment.Stroke is a common cerebrovascular disease in clinical treatment,and the incidence rate in China ranks first in the world,because of its high disability rate,high mortality rate,serious harm to the health of Chinese people,research progress has received extensive attention.Through reading many literatures,we have learned that many articles have focused on the use of machine learning related methods to analyze the stages of diagnosis,treatment and prognosis of stroke,on the one hand,through big data analysis to comprehensively and objectively recognize the factors that induce diseases,and improve the accuracy of clinical diagnosis;On the other hand,it can also provide scientific guidance and suggestions for later rehabilitation.In this paper,the current research situation of stroke and unbalanced data processing methods are summarized,a variety of oversampling methods and classification model principles are summarized,and then data preprocessing is performed on the stroke public dataset,including decision tree prediction to fill in missing values,standardization,etc.Through descriptive statistics,it can be seen that with age,the probability of common health problems such as hypertension,hyperglycemia,and overweight increases,and these common diseases are important factors inducing stroke,so we should pay attention to strengthening prevention,pay attention to blood pressure,blood sugar index control,and develop good living habits.Then,the unbalanced data was oversampled,and compared with the model evaluation results,the data set model score after Borderline-SMOTE processing was the highest.By substituting the balanced dataset into the classification model,the classification effect of XGBoost and Random forest ranked first and second,and the model optimization was carried out on these two methods,and the accuracy of the optimization was significantly improved.By applying the above machine learning algorithms to study stroke prediction models,the accuracy of model prediction is continuously improved,which has certain application value for stroke prediction and screening in medical clinics.Finally,the full-text research situation is summarized,the practical problems and relevant suggestions of machine learning applied to the medical field are put forward,and the limitations of the paper and the next improvement direction are explained. |