| In recent years,the incidence and mortality rates of cancer have been gradually increasing.Accurate prognosis prediction for cancer patients is a key issue in the current field of cancer research.Survival analysis is one of the important contents of cancer prognosis prediction,and its accuracy is of great significance for promoting the psychological recovery of cancer patients and guiding clinical doctors to formulate personalized treatment plans.With the development of digital imaging technology and gene sequencing technology,a large amount of multimodal data,such as pathological images and genomics,has been provided for survival analysis studies.Previous studies have shown that both pathological image and genomic data contain rich survival-related information,and there is complementarity between different data types.Therefore,how to effectively integrate different modalities of data to more accurately predict cancer patient survival risk is a key issue in the study of cancer survival analysis.However,most existing methods use direct concatenation to fuse data,which ignores the correlation between modalities and the potential information within each modality,resulting in poor survival analysis performance.To address these issues,this thesis combined two-stream network and co-attention mechanism to construct a multimodal fusion network for cancer survival analysis within the framework of multi-instance learning.The main contributions of this thesis are as follows:(1)For the problem of difficulty in annotating pathological images and extracting effective features,this thesis proposed a feature extraction method based on multiinstance learning and attention mechanism(FEMA).The FEMA has stronger feature representation ability,and compared with traditional fully convolutional network,the concordance index on the TCGA-BRCA dataset and the TCGA-LUAD dataset has been improved by 2.7% and 2.3%,respectively.(2)For the problem of insufficiently explored correlation between pathological image data and gene expression data,this thesis proposed a co-attentional Transformerbased multimodal fusion network(CTMFN).By combining pathological image data and gene expression data,the performance of cancer survival analysis is significantly improved.The concordance index on the TCGA-BRCA dataset and the TCGA-LUAD dataset leads the second place by 0.8% and 1.3%,respectively.Furthermore,by analyzing the influence of different modal data on survival analysis performance,it is further demonstrated that fusing different modal data can effectively improve the performance of survival analysis.(3)For the problem of insufficiently expressing survival-related information from pathological image data and gene expression data,this thesis proposed a two-stream co-attentional Transformer-based multimodal fusion network(TCTMFN).TCTMFN effectively integrates data by exploring the interaction relationships between different modalities and within each modality.In addition,to reduce interference from redundant information,this thesis proposed a multi-head attention pooling(MHAP)method to effectively aggregate features.TCTMFN has the best performance on the TCGABRCA dataset and TCGA-LUAD dataset,with the concordance index ahead of the second place by 2.3% and 2.4%,respectively. |