Font Size: a A A

Risk Prediction Of Liver Cirrhosis Complicated With Hepatic Encephalopathy Based On Cost-sensitive Random Forest And Support Vector Machine

Posted on:2019-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiFull Text:PDF
GTID:2334330563956121Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:Hepatic encephalopathy is one of the most common complications of liver cirrhosis,with complicated clinical manifestations,low cure rate and poor prognosis,which has become an important cause of low survival in patients with liver cirrhosis.Therefore,it is crucial to construct a risk prediction model for hepatic encephalopathy in patients with liver cirrhosis.Meanwhile,with the characteristics of class imbalanced in the data of liver cirrhosis with hepatic encephalopathy,the risk predictive models which were built by traditional statistics and machine learning algorithms are unable to identify effectively minority groups and had poor performance.In this paper,we attempt to establish the risk prediction models of liver cirrhosis complicated with hepatic encephalopathy based on the cost-sensitive random forest and support vector machine to solve the above mentioned problems,and improve the predictive effectiveness of the model to identify patients with liver cirrhosis complicated with hepatic encephalopathy.The model could be used by clinician to identify patients who are at greater risk for hepatic encephalopathy and select reasonable treatment.Finally,the application of those algorithms can be used as a reference for other disease risk prediction research.Methods:By investigating the medical records of cirrhosis patients who were admitted to The Second Clinical Hospital of Shanxi Medical University during January,2010 to April,2017,we got 1256 valid data according to the inclusion and exclusion criteria.At first,Chi square test,Wilcoxon rank sum test and AUC-RF algorithm were used to find out factors associated with hepatic encephalopathy.The second,taking the factors which were screened in the first step as input variables and whether concurrent hepatic encephalopathy as output variable,we built Logistic regression,weighted random forest(WRF)and cost sensitive support vector machine(CS-SVM)classification prediction models and study on the performance of those three models.Meanwhile,compare the recognition ability of theclassifiers which were built by Logistic regression,WRF,CS-SVM,the traditional random forest and support vector machine.Finally,the Logistic regression and WRF were used to build model to predict the risk probability of hepatic encephalopathy in patients with liver cirrhosis.Results:1.Through the single factor test and AUC-RF algorithm,the 20 factors associated with hepatic encephalopathy as input variables were used to build the prediction model,respectively: constipation,edema,electrolyte disorders,upper gastrointestinal bleeding,infection,diuretics,white blood cells,red blood cells,hemoglobin,neutrophilic granulocyte percentage,aspartate aminotransferase,sodium,chlorine,albumin,total protein,direct bilirubin,indirect bilirubin,prothrombin time,fibrinogen,activated partial thromboplastin time.2.Performance evaluation of classification prediction models :The median of results of Logistic regression classification predictive models shown as follow: true positive rate70.00%,true negation rate 83.38%,accuracy 82.54%,G-means 0.7679,F-measure 0.3688,AUC 0.7721;The median of results of WRF models shown as follow: true positive rate70%,true negation rate 85.82%,accuracy 84.69%,G-means 0.7739.F-measure 0.3930,AUC 0.7778;The median of results of CS-SVM models shown as follow: true positive rate71.66%,true negation rate 82.99%,accuracy 82.06%,G-means 0.7657,F-measure 0.3560,AUC 0.7688.3.Comparison of performance of classification prediction models : Compare the Logistic regression,WRF,CS-SVM,the traditional random forest and support vector machine classification prediction model: From the aspect of identify patients with hepatic encephalopathy,WRF,CS-SVM and Logistic regression model were better than the traditional machine learning model(true positive rate were higher than 70%)while for the recognition ability of patients with non-concurrent hepatic encephalopathy were slightly lower(true negative rate were about 85%).At the same time,the comprehensive evaluation index of WRF,CS-SVM and Logistic regression model was higher than other models(G-means is higher than 0.8000,F-measure is higher than 0.4000,AUC is higher than 0.8000).The three indexes of WRF model(G-means 0.8221,F-measure 0.4646 and AUC 0.8241)were superior to Logistic regression and CS-SVM model.4.Probability prediction model: WRF could be used not only to construct the classification prediction model,but also to predict the incidence of hepatic encephalopathy in patients with liver cirrhosis.Conclusions:Cost sensitive Stochastic Forest and support vector machine could make up for the shortcomings of traditional machine learning in the problem of imbalanced data classification,and improve the classification performance of the model for such data.The prediction efficiency of WRF and CS-SVM classifier in predicting hepatic encephalopathy in patients with liver cirrhosis were both superior than other models.The weighted random forest could provide the rate of concurrent hepatic encephalopathy in liver cirrhosis patients with better clinical application.The model established by WRF and CS-SVM in this paper can help clinicians identify patients who are at greater risk for hepatic encephalopathy,which is of great practical significance for prolonging the survival time and improving the quality of life of patients with hepatic encephalopathy.
Keywords/Search Tags:Cost sensitivity, Hepatic encephalopathy, Disease risk prediction, Weighted random forest, Cost sensitive support vector machine
PDF Full Text Request
Related items