Font Size: a A A

Study On Feature Selection Of Tumor Gene Expression Profiles

Posted on:2018-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:S FengFull Text:PDF
GTID:2334330515460255Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and information technology,in the biomedical field,more and more experts and scholars began to study the genetic data through the computer method,so as to complete the classification and diagnosis of tumor subtypes.Because of the high dimension and small sample of tumor gene expression data,theobtained data tend to include more irrelevant genes and redundant genes,and this information is very easy to cause the experimental results to be unsatisfactory and affect the diagnosis results.Considering the characteristics of tumor gene expression spectrum data,it is necessary to focus on how to select the low redundant feature information and how to construct the feature extraction algorithm with better robustness and greater generalization ability.The main work is as follows:(1)Gene selection based on signal-to-noise ratio and random forest.Aiming at the high dimension and small sample of tumor gene expression data,considering the traditional method will select a large number of redundant gene information,an improvedgene selection method that based on signal-to-noise ratio and random forest is proposed.Firstly,filtering method is used to remove redundant genes,so as to obtain a subset of feature primitives with strong classification attributes.Then we use random forest algorithm to classify feature subsets.Finally,the experimental results show that the proposed algorithm can quickly and efficiently select some subsets of feature genes.At the same time,it reduces the time complexity and improves the accuracy of classification compared with other algorithms.(2)In order to design an improved self-organizing map(SOM)algorithm with strong robustness,a better screening of the optimal subset of genes is achieved,the traditional self-organizing map algorithm that uses Euclidean distance is difficult to describe the similarity between the genes such as the positive and negative functions of the genes,and the particle swarm optimization(PSO)algorithm is proposed.In this paper,a new self-organizing map classification algorithm based on domain mutual information association is proposed,and optimization algorithm to achieve the optimal feature subset of the filter.The algorithm adopts domain mutual informationto assess the correlation between genes,then given the gene corresponding similarity.In order to carry out the feature mapping more quickly,we get the weight of the winning neurons.Considering the fast convergence of the particle swarm optimization algorithm,we employ the particle swarm optimization algorithm to filter the optimal feature subset.Firstly,the improved SOM algorithm is used to preprocess the original gene expression spectrum data,and the correlation between the genes was evaluated by the neighborhood mutual information,and then corresponding similarity was given,and the weight of the neurons was obtained.Lastly,the subset of the relevant feature is evaluated by the particle swarm optimization algorithm to determine the optimal feature subset.Simulation resultsshow that our method has fewer characteristics data,higher classification accuracy,and achieves good classification effect for multi-classification problems compared with other related methods.
Keywords/Search Tags:Gene selection, Signal noise ratio, Random Forest, Self-organizing map, Particle swarm optimization
PDF Full Text Request
Related items