Font Size: a A A

Study Of Gene Selection Algorithm For Multi-category Tumor Classification

Posted on:2020-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:C C YeFull Text:PDF
GTID:2404330578480065Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cancer has become one of the most important diseases that threaten human health.The diagnosis of tumor is a major problem in the medical field.The emergence of DNA microarray technology has enabled the study of tumor at the level of gene expression,which provides another important means for tumor classification.However,the gene expression data set contains a large number of redundancy and genes that are not related to tumor classification.Without feature selection for gene expression data,accurate classification results cannot be obtained even with the best classifiers.Gene selection can not only improve the classification accuracy of tumor,reduce the number of genes,but also discover disease mechanisms and reduce the cost of diagnosis.In this paper,based on the characteristics of gene expression data,the gene selection algorithm will be studied in depth.The main contents are as follows:(1)An adaptive particle swarm optimization algorithm(IG-SVM-APSO)that combines information gain and support vector machine for gene selection is proposed.Firstly,IG-SVM-APSO uses information gain to perform preliminary filter on genes,eliminating a large number of unrelated genes and noise.Aiming at the problem that particle swarm optimization algorithm is easy to fall into local optimal and premature convergence,adaptive inertia weight is introduced to enhance the global search ability of particle swarm optimization algorithm.Then,using SVM as the fitness function,the adaptive particle swarm optimization algorithm is used to select genes.Experiments show that IG-SVM-APSO has higher classification accuracy and fewer selected genes.(2)Although the information gain makes good use of the category information,it does not take into account the mutual information between the features.In order to reduce the redundancy between genes and obtain genes with lower similarity,this paper proposes a hybrid filter method combining information gain and Pearson correlation coefficient.The algorithm first divides the genes into similar and dissimilar parts.Then,under the premise of ensuring sufficient information gain value,more dissimilar genes and fewer similar genes are selected to achieve the purpose of reducing gene redundancy.(3)An improved simplified swarm optimization algorithm that combines a hybrid filter and local search strategy(i SSO-HF&LSS)is proposed.Firstly,the Pearson correlation coefficient is used to calculate the mutual information between genes,and the hybrid filter method is used to filter the genes to obtain the less redundant genes.Then this paper proposes a local search strategy and embed it into the simplified swarm optimization algorithm to obtain the improved simplified swarm optimization algorithm(i SSO).Finally,i SSO makes the final selection of the remaining genes under the guidance of the mutual information of genes.A large number of experiments have proved that i SSO-HF&LSS has obvious advantages in four aspects such as classification accuracy.
Keywords/Search Tags:tumor classification, gene selection, hybrid filter, local search strategy, simplified swarm optimization algorithm
PDF Full Text Request
Related items