Font Size: a A A

Mining Of Key Biological Molecules And Construction Of Prognostic Molecular Model For Esophageal Squamous Cell Carcinoma

Posted on:2023-12-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:M X LiFull Text:PDF
GTID:1524307034982079Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Esophageal squamous cell carcinoma(ESCC)is one of the common malignant tumors of digestive tract in China,and the five-year survival rate is less than 25%.The occurrence and development of ESCC is a complex process involved environmentgenetic-gene interaction,and a multi-factor and multi-stage evolution.The exploration of clinical and molecular characteristics related to the occurrence and development of ESCC is an important basis for improving early diagnosis,accurate treatments and curative strategies of ESCC patients.However,traditional biomedicine frequently focuses on the specific functions of a single molecule(gene,transcript or protein),which can reveal the characteristics of life activities at the molecular level to a certain extent,but lack of systematic and holistic exploration.With the development of computer technology,machine learning,deep learning,data mining techniques and bioinformatics have been widely used in the field of tumor research.In this study,based on ESCC transcriptome data,biomolecule interaction network and immune cell information,machine learning algorithm and weighted gene co-expression network analysis(WGCNA)were used to explore ESCC molecular characteristics,tumor classification,prognosis-related feature selection and prognosis model establishment.The main contents of this research are as follows:In the first part of this thesis,the key genes related to the prognosis of ESCC are retrieved based on a variety of machine learning algorithms.In this study,48 genes reported in 38 studies that related to the survival,recurrence or therapeutic response of ESCC were retriveved by database searching.In order to solve the problem of poor reproduciility of ESCC biomarkers,this study utilized two strategies for mining ESCC biomarkers.First,this study established our multiple independent prognostic models based on 5 machine learning algorithms and Cox proportional hazard regression algorithm that integrated the optimal prognostic factors.SFN was identified as a potential molecular target for the prognosis of ESCC.Second,because the variables are related to survival,4 variable evaluation methods based on the survival data(Cox,RFSRC,LASSO-Cox and Rbsurv)were used to evaluate the importance of molecular variables;the molecules that singled out as important features by various methods were identified as key prognostic molecules.Finally,combined with the results of these methods,the features with better prognosis prediction performance were constructed.In the second part of this thesis,molecular characteristics of ESCC and the prognostic role of molecular network module were explored based on differentially expressed molecular interaction network module in ESCC and machine learning algorithm.In this study,the differentially expressed genes and differentially expressed proteins were used for enrichment analysis to explore the molecular characteristics of ESCC.Then Netbox for molecular interaction building was used to establish molecular network module.Random survival forest algorithm was used to identify genes related to prognosis,and multivariate Cox regression method was used to construct molecular module characteristics.Subsequently,LASSO-Cox algorithm was used to screen prognostic module features and then a prognostic model of ESCC was established based on multiple molecular module features.In the third part of this thesis,the diagnosis model of ESCC lymph node metastasis was established using machine learning algorithm.Tumor metastasis is one of the most critical factors leading to poor prognosis of ESCC,and lymph node metastasis is one important alternative leading to tumor metastasis.The molecular mechanism of ESCC lymph node metastasis is not clear.Machine learning algorithm was used to identify genes related to lymph node metastasis,and the lymph node metastasis diagnosis model for ESCC was established based on m RNA expression data.Then,the biological significance of identified molecules related to lymph node metastasis was evaluated by enrichment analysis,and important molecular and biological pathways related to lymph node metastasis of ESCC were explored.To solve the problem that Boruta feature selection algorithm is greatly affected by the number of input features in the process of feature selection,a feature selection method called Boruta-rf based on random forest that is better than Boruta algorithm was used.In the fourth part of this thesis,immunotherapy is one of the most potent therapeutic modalities in ESCC.The model of diagnosis and prognosis based on immune characteristics of ESCC were established.First of all,the immune characteristics of each sample was enumerated using CIBERSORT and ESTIMATE,and WGCNA was used to identify genes related to immune cell cytolytic activity.Eight genes were discovered to be potential reference targets for ESCC immunotherapy.The diagnosis model was established based on immune features that were identified by LASSO-Cox method.Similarly,the Boruta-XGB,a feature selection algorithm based on sample expansion and XGBoost algorithm,is utilized.The diagnosis model for ESCC was constructed according to the immune features identified by Boruta,Boruta-rf and Boruta-XGB.Furthermore,the prognostic immune score(PIS)was developed based on immune characteristics as well.A nomogram model comprising Age,N stage,TNM stage,and PIS was constructed.The Nomogram-score is a prognostic risk factor for ESCC.
Keywords/Search Tags:Bioinformatics, Machine learning, Feature selection, Prognostic model, Esophageal squamous cell carcinoma, WGCNA, Immune cells, Lymph node metastasis
PDF Full Text Request
Related items