Accurate prediction of survival period can effectively distinguish breast cancer patients into different risk groups.On one hand,it can let patients know the prognosis information to make better living arrangements,on the other hand,it can let medical workers make reasonable decisions for patients treatment plan and medical decision-making,so that patients can get more accurate treatment.At present,due to the development of the new generation sequence technology,the amount of multi-omics data shows a trend of gradual growth.In the study of breast cancer survival prediction,with multi-omics data(such as DNA methylation data,gene expression data and copy number variation data),compared with using single group data,it may improve the accuracy of survival prediction of patients.Therefore,how to reasonably integrate the information of these multi-omics data to predict the survival of breast cancer patients is an important problem to be solved.This paper proposes a method to predict the survival time of breast cancer patients combined with machine learning algorithm,this method uses multi-omics data including methylation,gene expression and copy number variation(CNV)to estimate the survival period of breast cancer patients.For the feature selection of breast cancer patients with multi-omics data,this paper proposes a feature selection algorithm: REL1_FW,RE refers to twice,L1 refers to logic regression algorithm based on L1 regularization(L1LR),and FW(feature weight)refers to feature weight.This method is to use twice L1 LR algorithm to select the features of high-dimensional multi-omics data,and combine the feature weight coefficient to select the features.The reason for the secondary feature selection is that the multi-omics data has a high dimension and there is redundancy between the multi-omics data.This method is to first use L1 LR to get the single component feature selection,and then use L1 LR to get the multi-omics feature selection on the multi-omics data after the single component feature selection is spliced.FW refers to the final feature selection according to the requirement,the top features are selected as the best feature subset of multi-omics data.At the same time,this paper compares L1 LR algorithm with four commonly used feature selection algorithms,and the results show that L1 LR algorithm has the highest accuracy,which proves the validity of REL1_FW.For the multi-omics data after feature selection,this paper first compares the performance of five commonly used classification algorithms.The experimental results show that the best performance is the support vector machine(SVM)algorithm,which has high accuracy and good stability.This paper combines REL1_FW feature selection algorithm and SVM classification algorithm to predict the survival time of breast cancer patients.The experimental results show that the prediction accuracy is up to 99.9%. |