Font Size: a A A

Application Of Support Vector Machine Based On SMOTE And Optimization Algorithm In Prediction Of The Adverse Outcomes Of Chronic Heart Failure

Posted on:2020-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:F ChengFull Text:PDF
GTID:2404330590455944Subject:Public health
Abstract/Summary:PDF Full Text Request
Objective:Based on the prediction problem of “category imbalance” data of chronic heart failure,a support vector machine classification model based on SMOTE and optimization algorithm was constructed to predict the occurrence of adverse outcomes in patients with chronic heart failure and improve the predictive performance of classification models.This provides a theoretical basis for professional physicians to evaluate the prognosis of patients with chronic heart failure,and then carries out treatment intervention programs for highrisk patients to reduce the mortality rate and improve the prognosis.Methods:We retrospectively collected the medical records of inpatients diagnosed with heart failure in the First Hospital of Shanxi Medical University and Shanxi Cardiovascular Hospital from January 2014 to December 2017.According to the inclusion and exclusion criteria,the final screening results in an effective medical record.Based on the valid medical records,statistically significant predictive variables were obtained by chi-square test and rank sum test.All data sets were divided into training sets,verification sets and test sets according to 2:1:1.The final predictor variables were used as input variables and whether the adverse outcome of chronic heart failure patients occurred was an output variable.Firstly,the SMOTE algorithm was used to equalize the positive and negative samples of training sets and verification sets.Secondly,the support vector machine model parameter optimization was implemented on the verification sets,and the logistic regression,support vector machine,genetic algorithm support vector machine as well as particle swarm support vector machine model were constructed on the training sets,at the same time logistic regression and support vector machine model were constructed on the unbalanced training set.Finally,prediction performance evaluation of the classification models was carried out on the test sets,and the comprehensive evaluation indexes such as sensitivity,specificity,accuracy,G-means,F-measure and AUC were used to compare the performance of the classification models.Results:1.Through the univariate analysis,15 variables statistically related to whether adverse outcomes occurred were obtained,including white blood cells,absolute neutrophil count,high-density lipoprotein cholesterol,creatinine,uric acid,blood glucose,NT-proBNP,QTC,and ejection fraction,segmental wall motion abnormalities,NYHA class IV,lung disease,systolic blood pressure abnormalities(less than 96 mmHg),hemoglobin abnormalities(less than 110 g/L),hyponatremia(less than 138 mmol/L).2.Using the above predictive variables as input variables,based on the equalization verification set,the SVM model penalty parameter c and the kernel function parameter g are optimized by GA and PSO respectively.The optimal parameters are 87.707 and 1.1073,100 and 0.96456 respectively.The logistic regression,SVM,GA-SVM and PSO-SVM models were based on balanced training sets,and the median of the different predictive performance evaluation indexes of the four models were obtained.Logistic regression:sensitivity 55.56%,specificity 86.87%,accuracy 84.97%,G-means 0.6783,F-measure0.3007 and AUC 0.6900;SVM: sensitivity 61.11%,specificity 82.55%,accuracy 80.74%,G-means 0.6962,F-measure 0.2700 and AUC 0.7096;GA-SVM: sensitivity 66.67%,specificity 83.81%,accuracy 82.43%,G-means 0.7271,F-measure 0.3010 and AUC0.7467;PSO-SVM: sensitivity 66.67%,specificity 83.81%,accuracy 82.43%,G-means0.7263,F-measure 0.3000 and AUC 0.7443.3.Logistic regression,SVM,GA-SVM and PSO-SVM models were constructed based on the same balanced training set.Logistic regression and SVM models were constructed based on the same original training set.Comparing the sizes of each classification model evaluation index,it is found that the sensitivity,G-means and AUC of logistic regression and SVM model established by SMOTE algorithm are better than before;in addition,among the four classification models established after SMOTE algorithm processing,GA-SVM and PSO-SVM are lower than logistic regression in terms of specificity and accuracy,but sensitivity,G-means,F-measure and AUC are better than logistic regression.Conclusions:The SMOTE algorithm can be used to equalize the positive and negative samples in the original data set,so that the two types of samples with the “category imbalance” tend to be similar in quantity.Compared with the direct use of the original data set,the prediction performance of the classification models based on the equalization data set is greatlyimproved.Among all classification models based on equalization data sets,genetic algorithm support vector machine and particle swarm support vector machine classification performance are better than logistic regression and support vector machine.In this study,support vector machine based on genetic algorithm and particle swarm optimization were used to evaluate the prognosis of patients with heart failure,suggesting a high-risk group with poor prognosis of chronic heart failure,and provide a theoretical basis for targeted interventions.
Keywords/Search Tags:chronic heart failure, genetic algorithm, particle swarm optimization, SMOTE algorithm, prognosis assessment
PDF Full Text Request
Related items