| With the development of high-throughput methods and the reduction of costs,a large number of multi-omics data have been measured.For example,the Cancer Genome Atlas(TCGA)has collected information about the genome,epigenome,transcriptome,and proteome of more than 30 cancers from tens of thousands of patients,and different omics provide complementary and unique characteristics of cancer samples.Compared with single omic analysis,multi-omics integration has significant advantages because they can provide a more comprehensive view of biological processes,reveal the causes and functional mechanisms of complex cancers,and promote new discoveries in precision medicine.Therefore,there is a need for methods that can perform comprehensive analysis of multi-omics data and reliably integrate information generated from different sources to achieve cancer subtyping.In recent years,many integration methods for integrating multi-omic data have been proposed.Some methods have defects in the integration way.For example,LRAcluster is based on a comprehensive probability model of low-rank approximation,which can quickly find the low-dimensional shared main subspace between multiple data types.However,the algorithm directly splices the omics matrices together.Therefore,a certain omic matrix with more features will have a greater impact on the result,which may be inappropriate.Some methods do not consider the heterogeneity of omics data.For example,SNF algorithm uses a Gaussian kernel function with fixed parameters to establish a sample similarity network for each omic data,and does not take into account the possible different distribution of omics.In addition,The KNN algorithm used when SNF constructing similar networks tends to contain noise edges.Some methods need to adjust many parameters,such as i Cluster Bayes,which takes more time.Some methods may lose omics information.For example,PINSPlus separately obtains clustering results for each omic data and then integrates them,which may lose weak information in each omic data separately.Therefore,this thesis proposes a Network Integration based on Multi-Kernel(NI-MK)for cancer subtyping.This method takes into account the heterogeneity of multi-omics data,and the kernel weight coefficient can be learned adaptively according to the omic data without manual setting.Moreover,the consistent KNN algorithm used in this method uses the consistent information of global nodes to make the similarity or dissimilarity between sample pairs more accurate.The method is mainly divided into the following three steps:(1)Using a multi-kernel model to construct a similarity matrix for each omic data;(2)Using the consistent KNN algorithm to construct a local similarity matrix;(3)Using network fusion algorithm to integrate previously obtained similarity matrix.In order to verify the effectiveness of the NI-MK method,this thesis first compares NI-MK with SNF,PINSPlus,LRAcluster,CIMLR,and i Cluster Bayes on the multi-omics data of seven cancers.Experiments have shown that this method can distinguish cancer subtypes with large survival differences on seven cancers.On average,the cancer subtyping result of this method has the most significant differences in patient survival,which is 53.9% higher than the sub-optimal CIMLR method.The silhouette coeffcient of this method is second only to CIMLR.This indicates that NI-MK can identify the most effective cancer subtype,and the clustering effect is also very good.Then,using NI-MK to subtyping cancers on different combination of omics data types of seven cancers.The experimental results show that the cancer subtypes obtained by multi-omics data are more different in survival than the cancer subtypes obtained by individual data types.Moreover,the clinical significance of the cancer subtype identified by multi-omics is 120.8% higher than that of the DNA methylation that is best in single omic.That is,NI-MK can effectively integrate multi-omics data to obtain more clinically significant cancer subtypes.And in most cases,the more integrated omics data types,the better the effect.Finally,using each method to perform clustering experiments on the pan-cancer multi-omics data set.The results show that NI-MK has achieved the highest normalized mutual information(NMI),which is 10.4% higher than the second highest LRAcluster method.The adjusted Rand coefficient(ARI)of NI-MK is also the maximum value,which is 15.7% higher than the second largest SNF method.It shows that NI-MK has high accuracy for the data set with gold label. |