| Cancer classification prediction is an extremely important work in the field of biomedical research,which provides an important basis for the diagnosis and treatment of cancer.With the development and improvement of high-throughput technology,a large amount of gene expression profile data is used in the classification research of cancer.Gene expression profile data is a collection of gene expression levels,and modern molecular biology shows that the occurrence and development of cancer are closely related to gene mutations and the loss of tumor suppressor gene function.Therefore,gene expression profile data has also become the main data source for cancer classification research.From a genetic point of view,scientifically and effectively selecting a small number of carcinogenic related genes not only helps in the classification and prediction of cancer,but also further research on oncogenes also contributes to the development of related drugs,which is the popular "precision medicine" in recent years."And"targeted therapy".Aiming at the characteristics of high dimension and small sample of gene expression profile data.This paper proposes the FCM-SVM-RFE model based on SVM-RFE.First,pre-process the gene expression profile,and then divide the genes into gene modules by multi-objective fuzzy clustering;in multi-objective fuzzy clustering,in addition to the commonly used clustering validity indicators FPC and PBM,this article also uses Using BHI as the objective function makes the clustering results more biologically similar;in addition,using NSGA-Ⅱ in multi-objective optimization solution,the Pareto optimal solution can be obtained more quickly;then,on the basis of clustering A gene module uses SVM-RFE for feature selection to remove redundant genes more effectively;then,based on the idea of bagging,multiple different classification algorithm model training is used to obtain multiple classification prediction models with large differences,thus integrating multiple The advantages of the model algorithm to solve the problem of poor stability of the single classification model;Finally,the test samples are tested using the multi-algorithm multi-model and voting decision mechanism obtained by training to obtain the final cancer classification results to reduce the high-dimensional gene expression profile data Classification errors caused by small sample problems.This paper conducted an experimental study on the four different cancer gene expression profiles in the GEO database,including acute myeloid leukemia,breast cancer,colon cancer,and Taiwan non-smoking lung cancer data sets.The experimental results are that the FCM-SVM-RFE used in this article is better than SVM-The classification accuracy obtained by RFE is higher than 5%.The classification accuracy of the integrated model of classification prediction in this paper is higher than that of each single classification learning algorithm.The experimental results show that when dividing gene modules,multi-objective fuzzy clustering is used,and the results of the division can have biological significance;in the construction of feature extraction methods,considering the co-expression of genes,through the FCM-SVM-RFE method,Deleting features in each gene module can better solve the redundancy of gene feature selection;in the sampling process,the training set is diversity sampled to overcome the problem of overfitting caused by model training,so this article classifies The generalization ability of the prediction model has been enhanced;in the construction of the classification integrated model,multiple classification algorithms are integrated to complement each other’s advantages and solve the problem of the limitation of a single classification algorithm. |