Font Size: a A A

The Prediagnosis Comparison Of ALF Based On Three Statistical Algorithms With Built-In And External Dimensionality Reduction And Stacking Integration

Posted on:2022-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ZhangFull Text:PDF
GTID:2517306317480784Subject:Statistics
Abstract/Summary:PDF Full Text Request
Since the amount and dimension of medical data continue to increase,the speed and accuracy of the medical prediction of single statistical learning algorithm is urgently need to be improved.It is noticed that with respect to XGBoost,LightGBM can be built in the effect of dimensionality reduction,while XGBoost coupling with the external dimensioning methods can also increase the speed of prediagnosis.Thus,this paper selects the Acute Liver Failure(ALF)dataset of the 2018 Kaggle Competition,attempts to compare three statistical learning algorithms(XGBoost,LightGBM and Random Forest)built-in and coupling with external dimensionality reduction algorithm and uses Stacking integration to prediagnose acute liver failure.Specifically,this paper has three parts of work.Firstly,we compare three single statistical learning methods on the prediagnosis effect of ALF.Overall,the differences among prediagnosis accuracy of these models are not big.Among them,XGBoost can get more accurate results in the shortest time,which mostly accords with prediagnostic needs.However,XGBoost takes 10.5 seconds to deal with 1757 data,so its efficiency needs to be improved.It is noted that LightGBM has built-in reduction algorithms relative to XGBoost and random forests,but its prediagnosis time is longer than XGBoost.Therefore,it is necessary to compare the prediagnosis effects of the built-in and external dimensioning algorithm.Secondly,three single statistical learning algorithms are coupled to Factor Analysis(FA).We compare and analyze the prediagnosis results about before-after coupling.Among them,the selection Factor Analysis is designed to take the importance of the actual meaning behind the medical data in subsequent analysis into account,and new common factors in Factor Analysis can express real background information.Compared to single statistical learning algorithm,after coupling Factor Analysis,their prediagnosis time is averagely shortened by 59.2%,but the cost is their prediagnosis accuracy is averagely decreased by 0.1.Among them,XGBoost-FA has the highest accuracy and shorter time.Thirdly,this paper selects Autoencoder algorithm instead of factor analysis to nonlinearly reduce dimensionality.Three single statistical learning algorithms are used as the base model,and Logic Regressive is the second layer of the model,and the XLR-SAE-Stacking integrated is constructed.Comparative analysis of the prediagnosis results shows that XLR-SAE-Stacking has higher accuracy.The cost is that the prediagnosis time is about 5 times longer than other models in this paper.For the problems studied in this paper,you can choose XGBoost if you want easy operation;If you want immediate prediagnosis,you can choose XGBoost-FA.If you want high prediagnosis accuracy,you can choose XLR-SAE-stacking.
Keywords/Search Tags:statistical learning, Factor analysis, Autoencoder dimensionality reduction, comparative analysis, Stacking integration
PDF Full Text Request
Related items