| With the continuous development of society,the impact of cardiovascular disease on our life safety is also increasing.In recent years,it has become a trend to combine machine learning methods with the medical field,but most scholars pay more attention to the use of machine learning to predict tumors,cancers and other diseases.There are fewer studies on the application of machine learning to cardiovascular diseases.In fact,the mortality rate of cardiovascular diseases is higher than that of cancer,tumors and other diseases.It is particularly important to apply machine learning methods to the prediction research of cardiovascular diseases.This paper mainly uses Logistic Regression,XGBoost,Light GBM algorithm and the fusion model of the three to analyse cardiovascular disease data.When training the classifier,a10-fold cross-validation method is used to train the classifier to prevent overfitting.In analysis part,70000 data from the Kaggle online platform was selected for analysis.11 variables such as gender,age,blood pressure,height were selected as research indicators.The research problem is a two-category problem,the two types of data are "suffering from cardiovascular disease" and "not suffering from cardiovascular disease".The number of samples for "sickness" and "not suffering from disease" is basically the same,and there is no problem of data imbalance.In the process of descriptive statistics of the data,it is found that age,weight,blood pressure,smoking status are all important factors affecting cardiovascular disease.In the process of parameter adjustment,the grid search method is used to adjust the parameters.From the classification results,the classification accuracy of logistic regression is 72.13%,the classification accuracy of XGBoost is 73.67%,the classification accuracy of Light GBM is 73.72%,and the classification accuracy of the fusion model is 74.98%,the four models are good for predicting cardiovascular diseases.On the whole,the fusion model has the best predictive ability for cardiovascular diseases,and Logistic Regression performs the worst among the four models.The predictive capabilities of XGBoost and Light GBM are basically the same,but XGBoost model consumes more time than Light GBM.In general,in the prediction of cardiovascular diseases,the fusion model and the Lightg GBM model can be given priority. |