Font Size: a A A

Research On Diagnosis Classification Of Cancers Based On Extreme Gradient Boosting Algorithm

Posted on:2021-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:F Y MengFull Text:PDF
GTID:2404330602993881Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the major diseases that seriously affect human health,the incidence of cancer has the ascending tendency in recent years.Accurate diagnosis of tumor progression can help researchers understand the mechanism of cancer progression and guide the treatment decisions for patients.In this paper,Kidney Renal Clear Cell Carcinoma(KIRC),Kidney Renal Papillary Cell Carcinoma(KIRP),Lung Squamous Cell Carcinoma(LUSC),and Head and Neck Squamous Cell Carcinoma(HNSC)were used as an example to study the classification model of tumor diagnosis.This paper presents a diagnostic classification model for early and late stage of cancers based on extreme gradient boosting(XGBoost)algorithm and multi-omics data.By comparing to other machine-learning methods,the results indicated that the proposed XGBoost model produced better predictive accuracy than the state-of-the-art in cancer stage prediction on most datasets.In addition,the prediction accuracy of the model can be further improved by utilizing the deep learning algorithm to integrate multi-omics datasets.It provides powerful support for medical staff to accurately diagnose the stage of cancer patients.(1)Data collection and pre-processing.The molecular biological data and clinical data used in this research were collected from TCGA database.In order to ensure the quality of the collected data,a series of data pre-processing was carried out for the raw data,including the definition and imputation of missing values,deletion of loci with too many missing values as well as data standardization.(2)Cancer diagnosis classification model based on extreme gradient boosting algorithm.Based on three different omics data of cancer patients,XGBoost algorithm was used to construct a diagnostic model to classify the stage of cancer(early stage or late stage).Compared with other six machine learning algorithms,the classification model constructed in this study obtained better prediction performance in 9 out of 12 datasets.(3)Cancer diagnosis classification model integrating multi-omics data.Firstly,the deep learning algorithm was used to reconstruct the features from the mRNA expression dataset and then we combined it with DNA methylation dataset to generate the new feature dataset.Secondly,the combined feature set was used to construct the XGBoost classification model.The experimental results show that the accuracy of diagnostic classification model can be further improved by the integrated multi-omics dataset.(4)Identification and analysis of key genes related to diagnosis.First,the XGBoost algorithm was applied to identify key genes related to stage of cancer,and then these key genes were analyzed by KEGG pathway analysis and gene expression difference analysis.These results would be helpful to reveal the molecular biological mechanism of tumor occurrence and development.
Keywords/Search Tags:Diagnostic Classification, Machine Learning, Extreme Gradient Boosting, Multi-omics Data, Cancer
PDF Full Text Request
Related items