In the era of precision medicine,subtyping cancer from multi-omics level has become a hot research topic.Cancer subtyping studies based on multi-omics data can identify cancer subtypes more accurately at the molecular level by using the fusion of information on different omics.In this paper,I apply a machine learning approach to multi-omics data from multiple cancer datasets to investigate cancer subtypes by proposing different algorithms to uncover biomarkers in different cancer subtypes and contribute to the promotion of precision personalized medicine,with the following three main areas of research work.1.CSNF clustering algorithm and tumor subtyping study.Based on CCA algorithm and SNF algorithm,CSNF clustering algorithm was proposed and tumor subtyping study was conducted.Firstly,to reduce the influence of inter-group correlation on clustering,the correlation coefficients between groups were obtained by CCA;then added to SNF to weaken the inter-group correlation and thus improve the clustering;finally,the clustering was significantly improved compared with SNF,FCM and CC clustering algorithms on four cancer datasets.Analysis of differential genes revealed that five of the top 10 miRNAs with the most significant differences in kidney clear cell carcinoma(KIRC)(has-mir-22,has-mir-30 a,has-mir-30 e,has-mir-143,has-mir-148a)all interacted with the top 10 lnc RNAs OIP5-AS1,and OIP5-AS1 has been shown to be an important molecule in the development of tumors.Pathway analysis showed that miRNAs interacting with OIP5-AS1 all possessed the function of mediating translational repression.Simultaneous survival analysis revealed that OIP5-AS1 does not directly affect cancer prognosis,suggesting that OIP5-AS1 may synergistically inhibit cancer proliferation with interacting miRNAs.The experimental results demonstrated the effectiveness of the proposed CSNF algorithm and identified important molecules associated with tumor subtypes.2.RSC-MCR differential algorithm and tumor subtype study.Firstly,the pairwise correlations of all features were calculated based on RSC and decomposed to obtain pairwise correlations of different omics features,and the connections between different omics were established based on the pairwise correlations of different histological features.To remove redundant correlations,a difference algorithm is proposed to calculate the degree of difference between the original feature matrix and the matrix containing redundant correlation information among different omics.The method is compared with other methods for removing correlation and without removing correlation,and clustering is performed on five cancer datasets using three clustering methods and evaluated using three evaluation criteria.The experimental results demonstrate that the proposed RSC-MCR difference algorithm correctly removes redundant correlation and significantly improves clustering performance,and this improved strategy is universal.3.G-P algorithm and tumor subtype identification based on greedy and pruning ideas.Firstly,the transcriptome associated with m6 A regulators was found based on Pearson correlation,and then the survival-related transcriptome features were obtained using univariate cox regression analysis.After that,a G-P algorithm based on greedy and pruning ideas was developed for the discovery of key features.the G-P algorithm searched for the optimal features based on the greedy algorithm and used the pruning algorithm to expand the search range to find a set of key features CASC11,KRT14,PDZD4,which could well distinguish PDAC patients into two prognosis-related categories,and was validated on the ICGC dataset.Analysis of the key features revealed that the key features were differentially expressed in different subtypes and were closely associated with survival.The most associated regulator with key features,EIF3 B,was also reported to be an important marker for promoting PDAC.This study reveals the critical role of the m6A-related transcriptome for PDAC typing and explores the prognostic value of the identified markers.This paper focuses on the problem of cancer subtype identification based on multiomics data,and by proposing a series of improved algorithms for subtype identification and discovering key molecules,it provides theoretical support for precision cancer treatment and ideas for the development of personalized medicine. |