Font Size: a A A

Cardiovascular Disease Prediction Study In Middle-aged People

Posted on:2022-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ChenFull Text:PDF
GTID:2514306335468304Subject:Mathematical Statistics
Abstract/Summary:PDF Full Text Request
Cardiovascular disease,as the disease with the highest mortality rate in the past 30 years,has been gradually concerned by people.At present,most of the research remains in the treatment aspect.Although the treatment level of cardiovascular disease is increasing,at the same time,we should strengthen the prevention work.With the rapid development of big data analysis,this paper studies and analyzes the data of cardiovascular disease.The data of this study came from Hejing community.The data set included age,gender,systolic blood pressure,diastolic blood pressure and other characteristics.The collected data variables were analyzed and consolidated.Then the data were cleaned and the abnormal values were eliminated.Finally,the data of 52496 patients were included as the research object.First of all,exploratory analysis of the research object can obtain the relationship between variables and analyze their distribution.Then the research objects were randomly divided into training set and test set according to 7:3,and the logistic regression model,decision tree,support vector machine and BP neural network model were established on the training set,and verified on the test set to predict whether they have cardiovascular disease.According to the confusion matrix and ROC chart,the classification prediction accuracy rate,accuracy rate,recall rate,F1 score and AUC value were obtained,The four models are evaluated to select the optimal model.Among the 52496 subjects,28582(54.4%)were diagnosed with cardiovascular disease.When the logistic regression model of cardiovascular disease is established,the multi classification variables are transformed into two classification variables.The correlation coefficient,rank of matrix and variance expansion factor are used to analyze whether there is multicollinearity among the variables;Then,the decision tree,support vector machine and BP neural network models of cardiovascular disease prediction are established,and the confusion matrix and ROC chart are obtained respectively.Five indexes are used to evaluate and analyze the model.In this paper,the comprehensive evaluation criteria(AUC value)are selected to evaluate each model.For the first type of blood pressure(i.e.systolic blood pressure is greater than or equal to 180),the AUC value of decision tree is the highest,which is 0.869,indicating that when systolic blood pressure is greater than or equal to 180,decision tree is the most accurate choice to predict the probability of cardiovascular disease.In general,the AUC value of BP neural network model is the highest,which is 0.793 for all the population,indicating that the classification effect of this model is the best compared with the other three models.It is suggested that BP neural network should be used in the prediction of cardiovascular disease.The analysis results can provide reference for people in need,lay a solid foundation for the prediction of cardiovascular disease,and also can be used in the prediction of other diseases to provide support for the establishment of disease prediction system.
Keywords/Search Tags:Cardiovascular disease, Logistic regression model, Decision tree, Support vector machine, BP neural network model
PDF Full Text Request
Related items