| Because the intrinsic differences in clinical diagnosis and treatment designation between distinct breast cancer subtypes,it is significant to use high-throughput data to explore the intrinsic differences in breast cancer subtypes.At present,various biological databases are constantly improving in the field of bioinformatics.There are many distinct types of database,like omics databases,databases of reactions in organisms,databases of enzymes,etc.These various databases facilitate use of different types data in distinct studies.Through integrating different omics data can help improve the accuracy of predictions,and allow us to see the differences between breast cancer subtypes furtherly.As a highly specific cancer,breast cancer has a high prevalence and mortality in women.Although the overall cure rate of breast cancer patients is increasing with the improvement of medical standards and methods,the survival rates of different breast cancer subtypes are also different.Early identification of subtypes in breast cancer patients can help physicians develop different treatment options based on different subtypes of patients,so prediction of breast cancer subtypes can help guide the development of treatment options.There are many studies involving the classification and prediction of breast cancer subtypes,but there are relatively few studies on the classification of breast cancer subtypes by using integrated multi-omics data.In this article,we use estrogen receptor(ER),progesterone receptor(PR),human epidermal growth factor receptor 2(HER2)to define breast cancer subtypes and classify any two breast cancer subtypes using SMO-MKL algorithm.The SMO-MKL algorithm was improved by SMO,and it based on meta-dimensional analysis.We collected m RNA data,methylation data,and copy number variation(CNV)data from TCGA,and used SMO-MKL algorithm to integrate these three omics data to classify breast cancer subtypes.And the result of using three omics data with multiple kernels is better than that of using single omics data with multiple kernels.Furthermore,these significant genes and pathways discovered in the feature selection process are also analyzed.In experiments,we compared our method with other algorithm,and the proposed method outperforms other methods.We also apply SMO-MKL algorithm on multi-classification of breast cancer subtypes and triple negative breast cancer subtypes.From the results of multi-classification,using three omics data with multiple kernels is better than that of using single omics data.In the later section,we do some studies on triple negative breast cancer,which has the worst prognosis and also is the most studied.Firstly,we classified all samples into two groups using information of three receptors from clinical data,one group is triple negative breast cancer,and the other group is non-triple negative breast cancer.We compared triple negative breast cancer and non-triple negative breast cancer in terms of enzymes and genes related to cell cycle.We can clearly see the difference between these two groups from the result,and tried to find clues at the transcriptional level that can significantly distinguish between these two groups. |