Font Size: a A A

Research On Cancer Data Feature Selection Algorithm Based On Multi-Objective Evolutionary Optimization

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2404330575965451Subject:Engineering
Abstract/Summary:PDF Full Text Request
Recently,cancer has gradually become the leading cause of death worldwide,and researchers have made tremendous efforts to solve the cancer problem.With the development of microarray technology,gene expression profile data is widely accepted and has a profound impact in cancer diagnosis research.Gene expression profiling data generally has the characteristics of high-dimensional and small samples.Researchers have proved that some genes are related to cancer through constantly research,but most of the genes are not related to cancer,the addition of these genes has no negative effect on cancer diagnosis.In order to select representative features and improve the classification effect,the feature selection method is used to remove those redundant features.The high dimensionality of cancer data determines that all the problems solved in this paper are a multi-objective optimization problem,the classification effect and feature number of feature subsets are the issues to be considered at the same time.At the same time,some cancer data are in a category imbalanced status,therefore,based on the multi-objective evolutionary algorithm,two feature selection algorithms are designed for the characteristics of cancer data.The main research work of this paper is as follows:(1)This thesis proposes a heuristic algorithm(HAMS)for identifying molecular features for cancer diagnosis.When solving data sets with large data dimensions,most feature selection methods based on multi-objective evolutionary optimization will choose to pre-process the data set to reduce the dimension to a certain extent,and narrow the search space to reduce the search difficulty,but generally the data preprocessing part simply removes some features by correlation and other methods,and the final part of this paper is to find a feature subset as a whole,so that simple preprocessing may remove some features that are important in the whole feature subset.In the HAMS algorithm,the elite guidance update strategy runs through the population updating process.In order to make the next generation population develop in a better direction,the elite guidance update strategy uses the elite individuals to calculate a probability model and then it uses the probability model to generate a new population.In order to accelerate the convergence of dimensions,this paper adds a truncation strategy in the process of generating new populations.This strategy accelerates the convergence of the number of features with the increase of iteration.Finally,the non-dominated sorting method is used to generate the progeny population for the combined population of elite individuals and new populations.This paper compares HAMS and seven feature selection methods on five cancer datasets.Experiments show that HAMS can use fewer features to achieve better accuracy in cancer diagnosis.In this paper,bioanalysis of the features obtained with HAMS was carried out and most of them were found to be cancer-related.(2)This thesis proposes a feature selection algorithm(MOC1D-FS)to solve the problem of multi-class imbalanced cancer data diagnosis.Cancer data is generally a data set obtained using microarray technology.Due to the number of probes,sample source,etc.,cancer data generally has the characteristics of high-dimensional and small samples.Although the number of cancer data samples is small,many cancer data have the characteristics of category imbalance,and even some cancer data is still in a state of severe imbalance,which has a great influence on the classification performance of feature selection.It is easy to ignore the classification of small category samples during the process.In this paper,the MOCID-FS algorithm is proposed based on multi-objective evolutionary optimization algorithm.In order to solve the problem better,this paper uses the AUC(area enclosed by the ROC curve and the coordinate axis)of each category and the number of features in the data set as the objective functions.In addition,in the case of population initialization and mutation,the optimized strategy is used to select features that promote the classification accuracy and reduce the number of features as much as possible.This paper compares experiments on four data sets and five classical algorithms for dealing with unbalanced data.Experiments show that MOCID-FS is more effective.
Keywords/Search Tags:Feature selection, Multi-objective evolutionary Optimization, Cancer data, Elite guidance strategy, Imbalanced data
PDF Full Text Request
Related items