Font Size: a A A

Multi-View Simultaneous Symmetric Non-Negative Matrix Tri-Factorization:A Method Of Integrating Multi-omics Data For Subtype Identification

Posted on:2021-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q X LiFull Text:PDF
GTID:2404330623981349Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
With complex causes and strong heterogeneity,cancer is a complex disease that many people cannot get effective treatment.The effective molecular cancer subtype identification based on multi-omics information is a significant part of the transformation from basic biological research to clinical application.Although containing abundant information for subtype identification at the molecular level,multi-omics data still lacks efficient,effective,and resolvable methods for all types of data integration because of the high dimensionality,high noise,sparsity of high-throughput data and heterogeneity between multi-omics data.To solve the problem,this study proposes a method called MV-SSNMTF(Multi-View Simultaneous Symmetric Non-Negative Matrix Tri-Factorization).Firstly,the “Multi-View” means to generate several similarity matrixes for each omics data based on three different similarity measurement methods including improved cosine similarity,Euclidean distance similarity,and Manhattan distance similarity.Secondly,the modified SSNMTF method is adopted to decompose those similarity matrixes of each omics into sub-matrixes.Lastly,the common submatrixes generated from different omics data are fused to obtain a similarity connected graph from which subtypes are identified with graph-cut algorithm.We evaluated the modified SSNMTF method with simulated similarity matrix data and it achieved an accuracy of 100%,which shows the same performance as NG-WSSNMTF(Natural Gradient Weighted Simultaneous Symmetric Non-Negative Matrix Tri-Factorization).Applied on four sets of simulated omics data,MV-SSNMTF presents a similar performance to SNF(Similarity network fusion)and iCluster(Integrative clustering)methods in terms of accuracy,but its stability is better than SNF.We applied our method to BRCA(Breast cancer)and LUAD(Lung Adenocarcinoma)multi-omics data from TCGA(The Cancer Genome Atlas)and compared it with other popular methods,like SNF,iCluster and mixOmics.The result demonstrates that MV-SSNMTF often achieves the best result based on survival analysis without affected by the number of subtypes selected,MV-SSNMTF runs much faster than other methods except SNF,with slightly more time than SNF.Overall,MV-SSNMTF is a simple,effective,and scalable algorithm that can be applied to subtype identification based on multi-omics data,and it can also be extended to other application fields.The algorithm has been compiled into a python package and can be installed and used after downloading at https://github.com/QixiongLee/MVSSNMTF.
Keywords/Search Tags:Multi-omics, cancer subtypes identification, non-negative matrix factorization, similarity measurement, multi-view community detection
PDF Full Text Request
Related items