Font Size: a A A

Research On Double-high Disease Prediction Based On Data Mining

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:S XieFull Text:PDF
GTID:2404330602995167Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Chronic diseases have always ranked first among the types of diseases suffered by Chinese citizens.Hypertension,hyperlipidemia,and complications of cardiovascular and cerebrovascular diseases are all important components of chronic diseases.Therefore,it’s very realistic that data mining is used to predict the risk of double-high disease.In this study,we analyzed and processed the personal medical examination data provided by the Medical Examination Center to establish a mathematical model,the model can predict specific values of the five indicators of double-high disease.During the modeling process,it is found whether there is a relationship between the predictive indicators and the physical examination items,then to help doctors prevent and early detect the double-high disease to reduce the harm to body.In the study,the original data was divided into numerical data and text data,and features were processed separately.For the high dimensionality of numerical data,we achieved the dimensionality reduction of numerical data by the DW-RFE feature selection algorithm which obtained from optimization of the SVM-RFE algorithm.Two sets of comparative experiments were designed from the sample size and feature dimensions to verify the effectiveness of the DW-RFE feature selection algorithm.Aiming at the feature processing of text data,the WV-CNN feature extraction method optimized by Word2 Vec algorithm,firstly the algorithm vectorized the text features,and then put the vectorized features into the convolutional neural network for feature extraction.In order to verify the effectiveness of the WV-CNN feature extraction method,a comparative experiment was designed from the aspect of sample size.Finally,the numerical data reduced by the DW-RFE algorithm and the textual data processed by the WV-CNN algorithm were used as input for the double-high disease prediction model.Ridge regression algorithms,support vector machine algorithms,and ensemble learning were used to establish different prediction models.Among the models,the mean square error MSE of the prediction model built by the XGBoost prediction algorithm in ensemble learning is the smallest,and the prediction effect is the best.According to the best model of double-high disease prediction established by XGBoost prediction algorithm,the features affecting double-high disease were ranked.
Keywords/Search Tags:data mining, hypertension, hyperlipidemia, disease prediction
PDF Full Text Request
Related items