Prognosis refers to an estimate of the likely course or outcome of a disease.The establishment of database like TCGA has allowed researchers access to massive and authoritative data,facilitating the progress in studies on genomics-data-based predication of cancer prognosis.At present,the research results of cancer prognosis prediction mainly focus on the diseases with large sample numbers,such as BRCA,NSCLC and GBM.Genetic characteristic analysis by prognostic predication models is conducted mainly from biological perspective,but lack of studies that analyze genetic characteristics from calculative perspective to discover potential biomarker.Moreover,one-dimensional convolution is often used for genomics data analysis,the feature representation has great limitations.Aiming at such problem,this paper studies the prognosis prediction of patients with different cancer types based on gene expression data in TCGA,in order to reveal the mechanisms of cancers,discover potential biomarkers,and improve the performance of prognosis prediction model.The main contributions of this paper include:(1)Proposed a novel deep learning approach by combining a convolutional neural network with stationary wavelet transform(SWT-CNN)for stratifying cancer patients,RNA-seq gene expression data without gene filtering were used as input.The results show that the prediction results of SWT-CNN model are generally better than that of SVM model with manually selected features,but the SWT-CNN model have certain data dependence.The results of the SWT-CNN model prognosis prediction were used as the reference value for follow-up studies.(2)Proposed a scoring approach for evaluating gene importance and the Risk-score prognostic prediction model combined with Cox regression,in order to improve the interpretability of the model.The features calculated after the pooling layer of SWT-CNN model were extracted,and the relationship between them and the input features was approximately regarded as a linear mapping.The feature importance score was calculated according to the principle of least square method,and a prognostic prediction model,Risk-score,was proposed based on the calculation of sample Risk value by combining with Cox regression analysis.The results show that,compared with the SWT-CNN model,the AUC value predicted by the Risk-Score model was increased by 3%-13%.(3)Proposed a prognostic prediction model of gene expression images based on XGBoost feature selection(Deep GIX).One-dimensional RNA-seq gene expression data were transformed into two-dimensional gene expression images after XGBoost feature screening as convolutional neural network input.The Deep GIX model achieved the best prediction results(AUC = 0.91)on LGG datasets.And the prediction results of LUSC and OV datasets with the worst results using SWT-CNN model were improved to 0.64 and 0.71,respectively.The AUC values of almost all datasets were more than 5% higher than those predicted by SWT-CNN model.In addition,the predicted AUC values of all dataset models except LGG ranged from 0.64 to 0.72,which solved the problem that the prediction results of different cancer dataset models varied greatly.Based on this model,a multimodal model integrating mi RNA-seq data was established.And the results show that mi RNA-seq data were not applicable for predicting 3-year overall survival for all cancer types. |