Font Size: a A A

Decision Tree Ensemble Learning-based Cancer Survival Prediction And Analysis

Posted on:2019-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2404330566984934Subject:Information management and e-government
Abstract/Summary:PDF Full Text Request
Data driven decision-making has become an important concept and method in the era of big data and one of its typical application fields is medical industry.It is of great practical significance to build relevant index system and model using data mining techniques to provide strong support for medical decision-making.Cancer has always been hard to overcome by human beings.Cancer survival prediction is important and challenging for cancer prognosis.In recent years,some researches have used machine learning methods to construct prediction model for 5-year-survivability classification,which shows better performance than traditional clinical prediction and statistical tool-based prediction.In order to further improve the accuracy of cancer survivability classification,a GA-RF-based ensemble classification method is proposed by making improvement on random forest.In the proposed method,the algorithm of random forest is employed to generate the initial decision tree set,and then genetic algorithm is used to search the best combination of decision trees with the aim to improve the accuracy of ensemble model.The colorectal cancer data from the National Cancer Institute of the United States is used to construct the cancer survivability classification model,and the proposed method is compared to decision tree and random forest.The experimental results show that the GA-RF-based ensemble classification method not only shows the best accuracy,but also has lower ensemble complexity compared with the original random forest algorithm.For advanced cancer patients,survival time prediction can provide greater guidance for clinical practice.In order to get more accurate prediction results,a MSE and diversity-based regression tree ensemble method is proposed.Considering the importance of the diversity of individual learners in ensemble learning,a variety of operations are carried out to create diversity in both individual learner generation stage and individual learner selection stage.Bootstrap sampling and random subspace method are used to train a large number of regression trees by manipulating samples and features simultaneously.And a multi-objective optimization algorithm is employed to find a good regression tree combination by optimizing the diversity and prediction performance of regression trees.The cancer data from the National Cancer Institute of the United States is used to construct the cancer survival time prediction model.The proposed ensemble regression method is compared to decision tree and three classical ensemble learning methods based on decision trees,including random forest,AdaBoost and Gradient Boosting.The experimental results show that the proposed method obtained the lowest prediction error with a lower ensemble complexity and the ensemble model based on the proposed method could explain the target variable better.The results show that the improvement on random forest is effective and the improved method can improve the accuracy of the cancer survivability classification.At the same time,the proposed regression tree ensemble method can be used to predict the survival time of advanced cancer patients effectively.The above two methods can make up for the shortcomings of traditional experience prediction,and assist doctors to make more accurate medical decisions.Finally,considering the extension of the prediction model in medical practice,from the perspective of model management,a unified knowledge representation on the cancer survival prediction models is carried out to make sure the models can be uniformly maintained and invoked in the decision support system.
Keywords/Search Tags:Cancer, Survival Prediction, Decision Tree Ensemble, Genetic Algorithm, Multi-objective Optimization
PDF Full Text Request
Related items