Font Size: a A A

Machine Learning Based On Proteomic Data To Predict Lung Cancer Recurrence

Posted on:2024-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y FuFull Text:PDF
GTID:2530307070461924Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Lung cancer is one of the most common malignant tumors worldwide with a high recurrence rate,which seriously affects the survival of lung cancer patients.This study aims to predict the recurrence of lung cancer patients.We used machine learning to mine mass spectrometry-based proteomic data collected from the tumor tissues of lung adenocarcinoma(LUAD)and lung squamous cell carcinoma(LSCC)patients,and applied classification models to discovery prognostic biomarkers related to lung cancer recurrence.Patients with cancer recurrence within a certain period of time after surgical resection were divided into recurrence group,while patients without cancer recurrence within a certain period of time were divided into non-recurrence group.After missing value imputation,feature proteins were selected based on differentially expressed proteins and then on the feature importance of random forest.Classification methods including logistic regression,random forest,support vector machine,na?ve Bayes,knearest neighbor were applied and a majority voting ensemble model based on the aforementioned five models were constructed.After model evaluation,the optimal model was selected.The combination of feature proteins was screened by the optimal model,and the performance of the selected panel of feature proteins and the constructed model was evaluated.Survival analysis of the selected feature proteins was carried out to further explore the relationship between feature proteins and patient survival.Finally,we found that na?ve Bayes model based on a panel of four proteins(CACNA2D2,ADD1,TP53I3 and FOLR2)achieved optimal predictive performance on both the testing set of Chinese LUAD data(AUC: 0.875)and the data set of CPTAC LUAD(AUC: 0.802).These proteins were potential prognostic biomarkers associated with lung adenocarcinoma recurrence.In addition,support vector machine based on another four feature proteins(NNT,ACOT9,MTSS1 and MGST3)achieved optimal predictive performance on the testing set of CPTAC LSCC data with the AUC reaching 1.These four feature proteins were potential prognostic biomarkers associated with lung squamous cell carcinoma recurrence.In conclusion,we have built a method to predict postoperative recurrence using machine learning based on proteomic data.It can identify potential prognostic biomarkers of lung cancer and predict lung cancer recurrence,potentially contributing to postoperative risk stratification and personalized medicine.
Keywords/Search Tags:Lung cancer, Recurrence, Proteomics, Machine learning, Prognostic biomarkers
PDF Full Text Request
Related items