| Background and ObjectiveCancer(malignant tumors)is one of the most serious health problems in the world today,with millions of people dying from various types of cancer every year.Among them,hepatocellular carcinoma(HCC)is one of the most common types of cancer.HCC has a high incidence rate and mortality,which has brought huge burden to the global medical care system and society.In the clinical diagnosis and treatment of HCC,accurate and efficient prognosis prediction is crucial for implementing precision medicine and providing effective support for clinical decision-making.Traditional survival analysis methods are mostly based on a small amount of clinical data for prediction,which cannot reflect the overall picture and complexity of cancer patients.Therefore,it will have a certain impact on the accuracy of prediction;Fortunately,with the continuous development of the new generation of high-throughput sequencing technology,more and more omics data are used to predict the prognosis of cancer patients.These massive omics data provide a solid foundation and possibility for accurately predicting the prognosis of cancer patients;Meanwhile,with the continuous growth of high-throughput omics data,artificial intelligence methods such as machine learning and deep learning are gradually being applied and emphasized.However,there are currently few reports on the integration of deep learning methods and multi omics data to predict the prognosis of HCC.This also means that in the current treatment of HCC,it is difficult to achieve truly personalized treatment due to the lack of accurate evaluation and prediction of each patient’s unique situation.Therefore,conducting relevant research,exploring the potential of deep learning methods and omics data,and achieving prognosis prediction for HCC will provide strong support for precision medicine and personalized treatment in modern medicine.Based on this,this article will construct a new deep learning model prediction method that can integrate different omics data to efficiently predict the prognosis of HCC.By integrating genomics,transcriptome,and epigenomics data,this model method can provide more accurate prognosis prediction for cancer patients while classifying cancer subtypes,thus providing more personalized treatment programs for patients.In summary,the integration and utilization of massive omics data and the continuous development and application of deep learning models will have a profound impact on the prognosis prediction and precise treatment of cancer.Contents and Methods1.In this study,in order to further explore the prognosis prediction of HCC,we included genome,transcriptome,epigenomics and other multi omics data,and effectively integrated and utilized these data to improve the prediction accuracy of HCC.1.In order to improve the accuracy and reliability of prognosis prediction for cancer patients,this article innovatively constructs a deep learning strategy based on "supervision" and "stacking".Firstly,the Cox model is introduced for joint supervision and combined with multiple omics data to comprehensively evaluate the prognosis of cancer.Secondly,stack multiple supervision modules to further improve the predictive performance of the model.Finally,the joint loss function of the model is optimized to enhance the robustness and stability of the model.Through these innovative methods,this study constructed a stacked supervised auto encoder(SSAE)model,and used the transcriptome data of LUAD patients in the Cancer Genome Atlas(TCGA)database to simulate the prediction performance of the SSAE model.3.Using the Riken Japan(LIRI-JP)hepatoma monoomics dataset,the model was applied for research,and the prediction performance of various methods was compared from the monoomics level.4.The prediction performance of the SSAE model was tested using m RNA,mi RNA,and methylated multigroup data from TCGA-HCC patients,and the pre fusion and post fusion of multigroup data were studied,Then,an early integration stacked supervised auto encoder(EI-SSAE)model based on pre fusion and a late integration stacked supervised auto encoder(LI-SSAE)model based on post fusion were constructed,and the prediction performance was compared on the data of each group and various methods from a multi group level.5.In this study,a model based on deep learning was constructed using integrated multi group data to predict the prognosis of patients with HCC.By analyzing the prognostic index output from the model,the patients were divided into different survival subtypes,and a confidence analysis and survival analysis were conducted for each subtype.Results1.In the TCGA-LUAD dataset(containing 435 samples and 25481 genes),compared to randomly surviving forests(CI=0.54,P=0.15)and Deep Surv(CI=0.55,P=0.10)in the test set,SSAE had a higher consistency index(CI=0.58)and a lower Log rank test P value(P=0.05).A total of 40 differentially expressed genes were screened for biochemical analysis,including IGFBP1,ANXA13,MUC2,CIDEC,NTSR1,and DSG3,which are representative differentially expressed genes that are upregulated.In terms of survival analysis,there was a statistically significant difference in survival outcomes between the two subtypes(HR: 2.841,95% CI: 1.907-4.232,Log rank P<0.001).2.In the LIRI-JP liver cancer dataset(containing 237 samples and 13395 genes),SSAE had a higher consistency index(CI=0.72)and a lower Log rank P value(P=5.10E-03)compared to randomly surviving forests(CI=0.68,P=1.60E-02)and Deep Surv(CI=0.70,P=7.30E-03)in the test set.A total of 47 differentially expressed genes were screened for biochemical analysis,including G6 PD,HSPA6,NDRG1,CDC20,and BIRC5,which are representative differentially expressed genes that are upregulated.In terms of survival analysis,there was a statistically significant difference in survival outcomes between the two subtypes(HR: 14.411,95% CI: 6.12-33.92,Log rank P<0.001).3.In the TCGA-HCC dataset,under the same method,the evaluation indicators(CI,P value)of multi group data are significantly superior to various single group data.On the premise of the same data,the evaluation index of the LI-SSAE method based on post fusion(CI=0.89,P<2e-16)is significantly superior to the EI-SSAE method based on pre fusion(CI=0.67,P=5e-04),and also superior to basic machine learning methods(RSF,Deepsurvey).In the analysis of m RNA differences,a total of 223 differential genes were also selected,including 64 upregulated and 159 downregulated.In terms of survival analysis,there was a statistically significant difference in survival outcomes between the two subtypes(HR: 39.998,95% CI: 15.380-104.020,Log rank P<0.001).Overall,the experimental results indicate that the SSAE model has high predictive accuracy and stability,and can provide more accurate and reliable prediction results for the prognosis of cancer patients.This research achievement has certain application value and is expected to provide more effective support for personalized treatment and precision medicine of cancer patients.Conclusion and SignificanceThis study has significant innovation and practical value.1.This study proposes the concept of "stacked supervised deep learning",which stacks multiple supervised modules to further improve the predictive performance of the model.2.In the performance testing of the model based on lung adenocarcinoma uniomics data and the practicality evaluation of the model based on hepatocellular carcinoma uniomics data,the SSAE model has achieved excellent predictive performance.3.The LI-SSAE model constructed based on multi omics hepatocellular carcinoma data can effectively improve the performance of prognosis prediction and provide strong support for disease prognosis prediction.4.A cancer patient prognosis and survival prediction model based on big data has been proposed from the perspective of deep learning,providing more accurate prognosis prediction and personalized treatment plans for cancer patients,which has broad application prospects and important clinical significance.At the same time,the methods and ideas of this study also provide a certain reference for big data analysis in other fields. |