The Prediagnosis Comparison Of ALF Based On Three Statistical Algorithms With Built-In And External Dimensionality Reduction And Stacking Integration

Posted on:2022-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Zhang

Full Text:PDF

GTID:2517306317480784

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Since the amount and dimension of medical data continue to increase,the speed and accuracy of the medical prediction of single statistical learning algorithm is urgently need to be improved.It is noticed that with respect to XGBoost,LightGBM can be built in the effect of dimensionality reduction,while XGBoost coupling with the external dimensioning methods can also increase the speed of prediagnosis.Thus,this paper selects the Acute Liver Failure(ALF)dataset of the 2018 Kaggle Competition,attempts to compare three statistical learning algorithms(XGBoost,LightGBM and Random Forest)built-in and coupling with external dimensionality reduction algorithm and uses Stacking integration to prediagnose acute liver failure.Specifically,this paper has three parts of work.Firstly,we compare three single statistical learning methods on the prediagnosis effect of ALF.Overall,the differences among prediagnosis accuracy of these models are not big.Among them,XGBoost can get more accurate results in the shortest time,which mostly accords with prediagnostic needs.However,XGBoost takes 10.5 seconds to deal with 1757 data,so its efficiency needs to be improved.It is noted that LightGBM has built-in reduction algorithms relative to XGBoost and random forests,but its prediagnosis time is longer than XGBoost.Therefore,it is necessary to compare the prediagnosis effects of the built-in and external dimensioning algorithm.Secondly,three single statistical learning algorithms are coupled to Factor Analysis(FA).We compare and analyze the prediagnosis results about before-after coupling.Among them,the selection Factor Analysis is designed to take the importance of the actual meaning behind the medical data in subsequent analysis into account,and new common factors in Factor Analysis can express real background information.Compared to single statistical learning algorithm,after coupling Factor Analysis,their prediagnosis time is averagely shortened by 59.2%,but the cost is their prediagnosis accuracy is averagely decreased by 0.1.Among them,XGBoost-FA has the highest accuracy and shorter time.Thirdly,this paper selects Autoencoder algorithm instead of factor analysis to nonlinearly reduce dimensionality.Three single statistical learning algorithms are used as the base model,and Logic Regressive is the second layer of the model,and the XLR-SAE-Stacking integrated is constructed.Comparative analysis of the prediagnosis results shows that XLR-SAE-Stacking has higher accuracy.The cost is that the prediagnosis time is about 5 times longer than other models in this paper.For the problems studied in this paper,you can choose XGBoost if you want easy operation;If you want immediate prediagnosis,you can choose XGBoost-FA.If you want high prediagnosis accuracy,you can choose XLR-SAE-stacking.

Keywords/Search Tags:

statistical learning, Factor analysis, Autoencoder dimensionality reduction, comparative analysis, Stacking integration

PDF Full Text Request

Related items

1	Application Of Support Vector Dimensionality Reduction Machine For Multi-instance Learning
2	Local Linear Embedded LLE Method For Nonlinear Dimension Reduction Based On High Dimensional Space
3	Research On Dimensionality Reduction Classification Of T-SNE Combined With Support Vector Machine
4	Teaching And Guiding Strategies Of Fuction-gap Filling Questions In Senior High School Based On Dimensionality Reduction Method
5	Some Multivariate Statistical Analysis Methods And Its Simple Applications
6	The Application Of Multivariate Statistical Analysis Methods In University Teaching Evaluation
7	Statistical Analysis Of Comprehensive Development Level In Shanxi Province
8	Statistical Analysis Of MOOCs Based On Data Visualization
9	Dimensionality Reduction Algorithms Based On Manifold Learning And Its Application In Face Recognition
10	Research On Classification Of Gene Expression Data Based On Statistical Learning Algorithm