| Microarray analysis of cancer samples is a hot research topic in the field of bioinformatics.Microarray can measure the level of gene expression,which can be more accurate to determine the patients with insignificant early cancer.In addition,it can effectively identify which genes are important in cancer through machine learning,and provide medical researchers with important way to obtain the internal mechanism of cell carcinogenesis.To complete the task of cancer classification,data preprocessing,feature selection and classification are required.Data preprocessing is to remove as much noise as possible from the original data,and normalize the data to make the data conform to the normal distribution,so that samples and samples can be compared with each other.The gene expression data has the characteristics of high dimension.Among the tens of thousands of genes,there are only dozens of key genes that play the role of classification,while the number of samples is only about one hundred resulting in over fitting and lower classification accuracy.In order to solve it,it is necessary to increase the step of feature selection to select the key genes,which can reduce the training time of classifier and the classification accuracy.Finally,in the classification steps,further experimental comparison is used to draw a conclusion which classifier is best.The objective of this paper is to combine the Choquet fuzzy integral model in the steps of feature selection and classification.The Choquet fuzzy integral model considers the relationship between features,so it is suitable for the cancer classification.L1/2regularization can quickly complete feature extraction but doesn’t consider the relationship between features.This article proposes a wrapping feature selection algorithm FI-L1/2 based on Choquet fuzzy integral and L1/2 regularization.The algorithm is developed using Matlab,and the data preprocessing using Bioconductor toolbox.The experimental results showed that FI-L1/2 algorithm was higher than L1/2 algorithm in classification accuracy.Moreover,the results are compared with the experimental results of domestic and foreign research in recent years,in which the classification accuracy of DLBCL,Colon and GLI-85 datasets are significantly higher than that of other papers.For solving Choquet fuzzy integral,this article proposes two methods.One is based on L1/2 regularization,another is based on colony algorithm.The traditional Choquet fuzzy integral model is based on the genetic algorithm,but it has the disadvantage of poor search efficiency,which is not applicable to either feature selection or classification.In the feature selection step,the fuzzy measure is not required to be very accurate,as long as the correlation information between features and features can be provided.Therefore,the fast L1/2 regularization solution is better,and L1/2 norm is added to the fuzzy integral model to transform the solution problem into the minimum problem with constraints.In the classification step,an improved ant colony algorithm is adopted to solve the problem.Both local search and global search are considered in the design of the algorithm,which allows each ant to select a local search randomly probability to find a better solution as far as possible.The experimental results show that the fuzzy integral has a better classification effect when the number of features is small,and the classification effect of FI-ACO is slightly better than that of FI-GA in general,and the running time is significantly less than that of FI-GA. |