Font Size: a A A

Multidimensional Data Analysis Based On Machine Learning To Predict Adverse Prognostic Events Of Esophageal Cancer And Mechanism Of Esophageal Cancer Metastasis-associated Gene CCT2

Posted on:2023-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhuFull Text:PDF
GTID:1524306614983489Subject:Oncology
Abstract/Summary:PDF Full Text Request
Esophageal cancer(EC)is one of the most common digestive tract tumors in the world.Its morbidity and mortality rank sixth and fourth respectively.Esophageal cancer is occult at an early stage,and most patients have symptoms of swallowing obstruction or distant metastasis and lymph node metastasis at the time of diagnosis.According to the SEER database of the United States,only 25%of the patients were diagnosed in the limited stage,and 5-year survival rate is less than 20%.Thus,EC has a high incidence and a poor prognosis.Adverse prognostic events of EC are important reasons for the shortened survival time of patients with EC.In clinical practice,the incidence of distant metastasis of EC is about 25%,but that was nearly 50%in autopsy.Patients with distant metastases have a median survival of only 3 to 10 months,and these patients often progress more rapidly and have more aggressive tumors than patients with terminal EC.Malignant esophageal fistula is the most serious complication of advanced EC,with an incidence of 5.6%33%.Esophageal fistula has a very poor prognosis.Patients often die in a short time due to nutritional failure,pulmonary infection,mediastinal abscess and great vessel injury,with a median survival of only about 3 months.Therefore,distinguishing highrisk patients for early intervention and avoiding the occurrence of adverse events is an important means to improve the treatment effect of EC.But unfortunately,there was few research about prediction of distant metastasis and esophageal fistula.The main reason is that the prediction accuracy of esophageal fistula and metastasis by clinicopathological factors is low,and there is a lack of reliable biomarkers.The imaging features of tumors are macroscopic manifestations of gene,protein and molecular changes,which can reflect the biological characteristics of tumors.However,traditional imaging can only provide qualitative characteristic information,such as size,shap.It is highly subjective and difficult to repeat,which make it difficult to meet the requirements of precision medicine.With the development of artificial intelligence(AI)and machine learning(ML),Lambin proposed the concept of"Radiomics" in 2012.The core of radiomics is to extract high-dimensional data from multimodal medical images for accurate medical decision-making.The object of radiomics is high-dimensional data,such as shape based features describing tumor morphology,first-order statistics of histogram distribution,texture based features,which are extracted from images by mathematical algorithms.The traditional statistical methods are difficult to deal with high-dimensional data,while AI and ML can effectively select reliable features and eliminate redundant features,which can realize the effective combination of image features and clinical outcomes.In addition to feature selection,AI and ML can also integrate clinical,image,gene,pathology into algorithm models or classifiers to achieve accurate prediction of clinical events.Artificial intelligence is based on machine learning algorithms.In this study,least absolute shrinkage and selection operator expression(Lasso),elastic net(EN),support vector machine(SVM),k nearest neighbor(KNN),random forest(RF),Logistic regression(LR),linear regression(LR),cross validation(CV),bootstrap were all be used.In this paper,we used ML to realized the accurate screening of high-dimensional image features,the effective integration of clinical data,radiomics and biological features,and reliable prediction models in the way of supervised learning.To accurately predict the adverse events of EC.This study also makes full use of public database resources.Machine learning were used to mine the infiltrating immune cells and genes related to distant metastasis of EC.Preliminary exploration of the metastasis mechanism of EC through pathological experiments and cell biology experiments.This article focuses on the prognostic adverse events of EC.The prediction of distant metastasis and esophageal fistula,the characteristics of the immune microenvironment of metastatic EC,and the distant metastasis-related gene CCT2 were all studied.The chapters are relatively independent and related to each other.Part Ⅰ Multidimensional data analysis based on machine learning in distant metastasis of esophageal cancer Section Ⅰ Big data analysis based on database to predict distant metastasis of esophageal cancerObjective:1.To find and evaluate the clinicopathological features related to esophageal cancer metastasis.2.To establish an easy-to-use risk prediction model for distant metastasis of esophageal cancer.Methods:Eligible patients diagnosed from 2010 to 2015 were selected from Surveillance,Epidemiology and End Results(SEER)database.Multivariable logistic regression analysis was applied to establish a prediction nomogram.Discrimination,calibration,clinical usefulness and reproducibility were assessed by C-index,receiver operating characteristic curve(ROC)/the area under the curve(AUC),calibration plot,decision curve analysis(DCA),and bootstrap validation.DCA was also used to compare the novel model with the conventional predictive methods.The data of EC patients treated in Shandong cancer Hospital from February 2020 to October 2020 were continuously collected as external validation to evaluate the generalization of the model.Results:A total of 9026 patients were included for analysis.The nomogram incorporated the predicors:age,sex,race,grade,T-stage,N-stage,histology,tumor location.The prediction model presented good discrimination with an AUC of 0.738 and a C-index of 0.747(95%Cl:0.734-0.760)which was confirmed to be 0.745 through bootstrap validation.Calibration plot and DCA showed satisfactory calibration and good net benefit respectively.Comparing with the conventional prediction methods,the nomogram yielded superior net benefit.A total of 141 cases of EC were collected as external validation.The AUC calculated in the external validation set was 0.780(95%CI:0.706-0.855),and the C index was 0.743,suggesting that the prediction model had good generalization.Conclusions:1.A prediction model for distant metastasis was established and verified to help clinicians predict the risk of distant metastasis in patients with EC.2.The predictors of the nomogram include age,gender,race,T stage,N stage,histology,tumor location and pathological grade.Section Ⅱ Multidimensional integrated model based on machine learning to predict distant metastasis of esophageal cancerObjective:1.To screen radiomics features related to distant metastasis of EC,and construct models using different machine learning algorithms.2.Develop a visual nomogram based on radiomics and clinical features to personally predict the risk of distant metastasis.Methods:Two hundred and ninety-nine EC patients were enrolled and randomly assigned to a training cohort(n=207)and a validation cohort(n=92).Logistic regression was used to evaluate clinicopathological factors.A model based on clinicopathological features was constructed for comparison.Radiomics features were extracted from contrast-enhanced computed tomography(CT)performed before treatment,and least absolute shrinkage and selection operator(Lasso)regression was used to screen the optimal features.Radiomics signature were calculated by a linear combination of features and corresponding coefficients.The multidimensional nomogram incorporating clinical features and radiomics was constructed by logical regression algorithm.All prediction models were further validated in the validation set by discrimination,reclassification,caliberation,clinical usefulness,goodness of fit.Results:The clinical model was established by lymph node,stage and different iation degree with an AUC(95%CI)of 0.731(0.626-0.836)in validation set,0.82(0.773-0.886)in training set,and AIC of 215.9.There were 16 features screen ed by Lasso algorithm from contrasted-CT performed before treatment.The Del ong test showed that the radiomics model constructed by SVM,KNN,and LR had the same degree of discrimination for distant metastasis,which was better than the model constructed by the RF algorithm.Univariate regression analysis of radiomics signiture was significant for patients with distant metastasis and th ose without(P=0.000).The multidimensional nomogram was established by inde pendent prediction factors and radiomics signature.The AUC(95%CI)was 0.827(0.742-0.912)in validation set,0.857(0.8062-0.9076)in training set,AIC was 198.02,and specificity was 86.4%.Decision curves showed the multidimensiona 1 prediction model had better net benefits than the clinical one at all threshold probabilities.Compared with the clinical model,NRI of the multidimensional n omogram improved by 0.114(95%CI:0.075-0.345),IDI improved by 0.071(95%CI:0.030-0.112),P=0.00068.Conclusion:1.Radiomics features of enhanced CT before treatment can predict the risk of distant metastasis.2.The multidimensional prediction nomogram constructed by clinical features and radiomics can predict distant metastasis more accurately.Part Ⅱ Tumor immune microenvironment of metastatic esophageal cancer and the predictive value of tumor infiltrating immune cells for distant metastasisObjective:1.To explore the characteristics of immune cell infiltration in metastatic EC,and to analyze the relationship between different infiltrating immune cells and clinical characteristics.2.To build a multidimensional prediction model based on radiomics,immune cells and clinical features.3.To evaluate the role of radiomics in the prediction of immune cell infiltration.Methods:The RNAseq data were downloaded from TCGA database.The relative number of immune cells were calculated by TIMER algorithm,and then correlation between immune cells and clinical features were analyzed.A total of 165 clinicopathological samples were collected.Six kinds of immune cells were labeled by immunohistochemistry.The correlation between various immune cells and clinical features were analyzed.A multidimensional prediction model basing on infiltrating immune cells,radiomics and clinical features was constructed and validated.Taking the infiltration density of CD3+T cells,CD4+T cells,CD8+T cells,NK cells,B cells and macrophages(divided into high infiltration and low infiltration according to the median count)as the prediction end point,Lasso-logistic algorithm screened the most relevant radiomics features to constructed radiomics signatures,which were tested in internal verification set.The optimal signature was selected for external verification in The cancer imaging archive(TCIA)database to evaluate its robustness and relationship with immune score.Results:TCGA data analysis showed the density of macrophages increased significantly in patients with metastasis(P=0.047).Survival analysis showed that patients with higher infiltration of B cells(P=0.043)and macrophages(P=0.010)had worse prognosis,but multivariate analysis showed no significant difference.Univariate analysis of 165 clinical samples showed that macrophages,B cells and CD3+ T cells were associated with lymph node metastasis;NK cells,B cells and CD3+T cells were associated with distant metastasis.Multivariate regression analysis showed that the decrease of NK cells was an independent risk factor for distant metastasis(P=0.011),and the increase of macrophage infiltration was an independent risk factor for lymph node metastasis(P=0.03).The immune cell signature constructed by elastic network can accurately predict the risk of distant metastasis of EC with AUC of 0.805(95%CI:0.701-0.909).The multidimensional model based on radiomics,immune cells and clinical features can effectively distinguish patients with distant metastasis from patients without in training set(AUC=0.948)and in validation set(AUC=0.989),which also had good accuracy(AUC-PR=0.775),calibration(calibration curve)and clinical decision value(DCA curve).Conclusion:1.The infiltration of CD3+ T cells,NK cells,macrophages and B cells in metastatic EC is significantly different from that of non metastatic patients.The decrease of NK cell infiltration is an independent risk factor for distant metastasis,and the increase of macrophage is an independent risk factor for lymph node metastasis.2.Immune cell infiltration can be used as a biomarker to predict distant metastasis.The performance of multidimensional model based on clinical data,radiomics and immune cells was significantly improved.3.The radiomics features based on CT can predict the infiltration of CD3+T,with good resolution and robustness.It is a potential biomarker to predict the prognosis and the efficacy of immunotherapy.Part Ⅲ Identification of a metastasis related gene CCT2 in esophageal cancer and its mechanism Section Ⅰ Identification of a novel gene CCT2 associated with metastasis of esophageal cancerObjective:1.To identify a gene related to distant metastasis of EC.2.To verify the relationship between CCT2 expression and clinicopathological features.3.The effects of CCT2 gene on the proliferation,invasion and migration of esophageal cancer cells were verified in vitro.Methods:The mRNAseq information and matched clinical information of EC were downloaded from TCGA database.The differentially expressed genes(DEGs)were analyzed between distant metastasis samples and non distant metastasis samples.Protein protein interaction network(PPI network)were used to sequence key genes,and according to the scores obtained by 11 topological analysis methods,the target gene CCT2 was determined.Oncomine and TIMER2.0 database were used for preliminarily verification.The expression of CCT2 in 153 clinical samples of esophageal cancer and 17 esophageal mucosal epithelium was detected and compared by immunohistochemistry.Relationship between CCT2 expression and clinicopathological features was also analysed.In vitro experimental,the cell lines with high expression of CCT2 were screened,which was knocked down by small interfering RNA(siRNA).The knockdown efficiency was verified by Western blot.The effects of CCT2 gene on the proliferation,migration and invasion were tested by EDU cell proliferation experiment,cell scratch experiment,Transwell migration and invasion experiment respectively.Results:A total of 55 DEGs were screened from TCGA database.CCT2 was identified as the target gene according to protein interaction network and literature search.Bioinformatics analysis of TCGA data showed that expression of CCT2 gene was significantly different between esophageal cancer tissues and adjacent tissues(P<0.001).The expression of MKI67 and PCNA increased with CCT2 expression(P=0.000).Survival analysis showed that the OS of patients with high CCT2 expression was significantly shorter than that of patients with low(P=0.043),and survival difference after stage adjustment was more significant(P=0.007).Pathology experiment showed that the positive expression of CCT2 was yellow or brown staining in cytoplasm.Compared with normal esophageal mucosal epithelium,CCT2 was highly expressed in EC(P=0.000),which was an independent risk factor for distant metastasis(P=0.0363).The expression increased with the tumor grade(P=0.014)and tumor length(P=0.013).In vitro experiments showed that CCT2 is generally highly expressed in esophageal squamous cells.After knockdown of CCT2 in KYSE450 and KYSE150 cells,the proliferation,invasion and migration decreased significantly.Conclusion:1.CCT2 is highly expressed in EC,which is an independent risk factor for distant metastasis.The expression increases with the tumor grade and tumor length.It is a potential biomarker for distant metastasis and malignancy.2.The expression of CCT2 promotes the proliferation,invasion and migration of esophageal squamous cells.Section Ⅱ The impact of CCT2 gene on tumor immune microenvironment and related mechanismsObjective:1.Identification the association of CCT2 gene on immune microenvironment.2.To explore the mechanism of CCT2 gene on tumor immune microenvironment.Methods:The expression of CCT2 and the abundance of immune cell infiltration was analyzed by the TIMER2.0 database immune module(Immune association).CCT2 expression and tumor infiltrating immune cells were detected in tissues of 153 EC cases by IHC.Transcriptome sequencing of esophageal cancer cell was applied to analysis the possible molecular mechanism.Multiplex bead-based flow fluorescent immunoassay(MBFFI)was used to detect cytokines,and Western blot was used to detect key proteins in the MAPK/ERK and NFκB pathways to preliminarily verify the mechanism.Results:The expression of CCT2 was positively correlated with MDSC(P=0.000),CD4+Th2(P=0.000),and M2 subpopulation of macrophages(P=0.000),and was negatively correlated with NK cells(P=0.004),CD8+T cells(P=0.003),dendritic cells(P=0.000).Clinical sample verification showed that the expression of CCT2 was significantly increased in the M2 macrophage high infiltration group(P=0.036),and that decreased in NK cell high infiltration group(P=0.037).Multivariate analysis showed that macrophage M2 subgroup infiltration increased significantly in tissues with high CCT2 expression(P=0.012),which was not affected by other infiltrating immune cells.There were 255 DEGs found by transcriptome analysis,189 downregulated,and 66 up-regulated,DEGs were enriched in cytokine and cytokine interaction pathway,IL-17 pathway and TNF-α pathway.After CCT2 was knocked down,IL-6 and TNF-α in the cell culture supernatant were reduced by 61%and 82%(P<0.05)respectively,and pNFκB and p-ERK1/2 proteins were significantly downregulated(P<0.05).Conclusion:CCT2 may mediate tumor-related inflammation through MAPK/ERK and NFκB pathways,and induce the formation of tumor immunosuppressive microenvironment.Part Ⅳ Multidimensional data analysis based on machine learning in risk prediction and prognosis of malignant esophageal fistula in esophageal cancerSection I Risk factors for esophageal fistula in esophageal cancer patients treated with radiotherapy:A systematic review and metaanalysisObjective:The aim of this meta-analysis is to explore the risk factors of esophageal fistula in esophagus cancer patients treated with radiotherapy.Methods:Pubmed and Embase databases were retrieved for clinical researches dated from 1990 to 2018.The "Newcastle Ottawa scale" was used to evaluate the quality of documents.Meta-analysis was performed using the Revman 5.3 software provided by Cochrane collaboration network.Results:Seventeen articles were eligible for meta-analysis.Of these articles,over 35 risk factors were described for esophageal fistula formation and 17 risk factors were analyzed.Significant differences for the odds of formatting esophageal perforation were found for age(OR 2.34,95%CI 1.08-5.03,P=0.001),ulcerative type(OR 2.72,95%CI 1.43-5.16,P=0.002%histology(OR 4.16,95%CI 1.14-15.12,P=0.03),T stage(OR 2.66,95%CI 1.44-4.91,P=0.002),short term response(OR 2.21,95%CI 1.064.62,P=0.03),chemotherapy regimen(OR 2.80,95%CI 1.38-5.68,P=0.005),stenosis(OR 2.00,95%CI 1.03-3.89,P=0.04).Conclusion:The young patients with ulcerative type,squamous cell cancer,T4 stage,non-complete response,fluorouracil-based regimen,and stenosis were associated with the increased risks of esophageal fistula during or after radiotherapy.Section Ⅱ Multidimensional integrated model based on machine learning to predict the risk of malignant esophageal fistula in esophageal cancerObjective:To construct a multidimensional integrated model to predict risk of MEF based on clinical features and radiomics.Methods:A total of 122 patients enrolled were randomly divided into training set(n=86)and verification set(n=36)according to the ratio of 7:3.Logistic regression analysis was used to screen clinical features and construct the clinical model as a comparison.Lasso and logistic algorithm were used to screened radiomics features from contrasted CT performed before treatment.Multidimensional prediction model was constructed by clinical independent predictors and radiomics signature.All models were further validated in the validation set by discrimination,reclassification,caliberation,clinical usefulness,goodness of fit.Results:Univariate analysis screened out stenosis(P=0.01),gender(P=0.23),and T stage(P=0.11)as risk factors.All were included in the multivariate analysis.The results showed that stenosis was an independent predictor(P=0.023).The AUC of clinical model was 0.691(95%CI:0.582-0.800)in the training set,0.640(95%CI:0.453-0.827)in validation set,and the AIC was 115.8.Radiomics signature was constructed by features selected from the CT scans performed before treatment.The AUC(95%CI)of multidimensional model was 0.867(0.7461-0.987)in validation set,0.782(0.6840.8796)in training set,AIC was 101.1,and specificity was 95.2%.Decision curves showed the multidimensional prediction model had better net benefits than the other two at all threshold probabilities.Compared with the clinical model,NRI of the multidimensional prediction nomogram improved by 0.236(95%CI:0.153-0.614),IDI improved by 0.125(95%CI:0.040-0.210),P=0.004.Conclusion:The CT radiomics features of the primary tumor in arterial phase before treatment can be used as markers to predict the occurrence of MEF.The model constructed basing on clinical features and radiomics has good predictive performance.Section Ⅲ Multidimensional integrated model based on machine learning to predict the prognosis of malignant esophageal fistula in esophageal cancerObjective:To explore the value of radiomics in predicting the prognosis of esophageal fistula,and construct a multidimensional prognostic model based on clinical features and radiomics.Methods:Seventy-six patients with MEF were enrolled.Chest enhanced CT before treatment and within 1 month after the fistula were obtained.The end points were overall survival time(OS1)and survival time after fistula(OS2).Cox univariate regression was used to screen prognostic factors.Clinical models incorporating independent prognostic factors were developed.Lasso algorithm was used to filter variables and avoid overfitting.Radiomics signature was a linear combination of features and corresponding coefficients.The multidimensional esophageal fistula prognostic nomogram were constructed by radiomics and clinical features.Stepwise regression algorithm was used to calculate the riskscore,which stratify patients into three risk groups.Survival analysis were performed between groups by Kaplan Meier method and Logrank test.Results:Multivariate regression analysis showed that age,prealbumin,KPS and interval between diagnosis to fistula were independent prognostic factors for OS1.The independent prognostic factors were used as parameters to establish the clinical prognostic nomogram.The C-index was 0.719(95%CI:0.645-0.793),and corrected C index calculated by bootstrap method was 0.688.Age,prealbumin,plasma albumin,KPS and neutrophil percentage were independent prognostic factors for OS2.The C index of the prediction model based on these factors was 0.722(95%CI:0.653-0.791),and the corrected C index was 0.686.Radscore was constructed by radiomics features of pre-fistula CT.Multidimensional prediction models for OS1 and OS2 were established by clinical independent prognostic factors and radscores.The C index for OS1 was 0.831(95%CI:0.757-0.905)and 0.77(95%CI:0.686-0.854)for OS2.The corrected C index were 0.803 and 0.717,respectively.Conclusion:The CT radiomics features of the primary tumor in arterial phase before treatment can be used as markers to predict the overall survival and post-fistula survival of patients with MEF,The models constructed based on clinical features and radiomics has better predictive performance.
Keywords/Search Tags:Esophageal cancer, Metastasis, Population-based cancer registry, Radiomics, Nomogram, Machine learning, Tumor infiltrating immune cells, CCT2, Bioinformatics analysis, Immune microenvironment, Risk factors, Esophageal fistula, Radiotherapy, Meta-analysis
PDF Full Text Request
Related items