Font Size: a A A

Research On Medical Data Imputation Method Based On Stacking Ensemble Learning

Posted on:2024-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:S B WuFull Text:PDF
GTID:2544307160479644Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,analyzing large amounts of data and drawing conclusions from them has become a key focus of data scientists.Often,these large amounts of data are missing for various reasons,a phenomenon that is particularly prevalent in the medical field.Missing data can lead to reduced statistical power,which seriously affects the accuracy of diagnosis and can lead to misdiagnosis or even wrong diagnosis.Therefore,it is extremely important to use effective filling methods for missing data in medical problems.KNN,decision tree,and SVR are three commonly used learning methods that have been successfully applied to data imputation in recent years.Compared to traditional data imputation methods,they have the advantage of strong interpretability.However,when used individually,they are sensitive to parameter selection and have certain limitations.To address this,this paper proposes to use the Stacking ensemble learning strategy to integrate these three different learning mechanisms as base learners,construct a new strong learner,and design it for missing data imputation.We hope that the integrated method can improve the shortcomings of the original three methods and effectively enhance the accuracy of missing data imputation.This paper first compares the Stacking filling model with the base learners and verifies that the algorithm after integration can combine the advantages of each base learner and increase the filling accuracy.At the same time,it conducts comparative experiments on medical data with different missing rates and missing mechanisms,and compares them with other commonly used filling algorithms,such as SOM,FCM,decision tree,random forest,and Mice multiple imputation.Among them,the Stacking and Mice multiple imputation models perform well.In the filling experiment,at a missing rate of10%,the MSE of Stacking is on average 3.0% higher than Mice,While at missing rates of20%,30%,40%,and 50%,it decreases on average by 2.2%,16.7%,36.5%,and9.5%,respectively.Overall,the filling MSE of Stacking is 12.4%lower than that of Mice.In the experiment of prediction after filling,at missing rates of 20%,25%,and 30%,the Stacking’s predicted MSE is on average 1.19%lower than Mice.These results indicate that the Stacking filling model proposed in this paper has a strong filling effect and better stability in situations where the missing data rate is high or the missing mechanism is not completely random.Medical data is characterized by high complexity,strong specialization,large scale,and rapid updates.It often exhibits a high rate of missing values.The proposed imputation model in this paper can achieve more effective imputation results compared to other commonly used imputation algorithms under the aforementioned missing data scenarios.
Keywords/Search Tags:Missing value, Stacking integrated learning model, Medical data filling
PDF Full Text Request
Related items