Font Size: a A A

The Research On Hybrid Significant Genes Selection Base On Heuristic Clustering

Posted on:2011-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:2120360308469509Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present, DNA microarray technology is the main supporting technologies at Informatics of Genome. It provide the most basic and necessary information for cancer research at the level of genome. Because gene chips have small samples but high dimensions and a mass of noise data, however, the gene chip data processing are facing with a lot of difficulties and challenges. How to use reasonable algorithms to avoid the curse of dimensionality, at the same time identify the most significant genes, have become a hot research on gene expression data processing and analysis at present.To this end, this paper research on informative gene selection, present two hybrid significant genes selection methods base on heuristic clustering.1. Significant gene selection method based on minimal spanning tree. Because traditional method are suitable to process bulbiform data, but minimal spanning tree clustering is good at process data which have complex graphic border. Various distance measure are used in minimum spanning trees clustering in gene expression data to get feature gene sets dynamically. Support vector machine is used here as classification for predicting. And then we present a method to further delete redundant genes in feature gene set. The experimental results showed that our method produces impressive and competitive results in terms of classification.2. Significant gene selection method based on a two-step clustering. The gene chip has many characteristics such as high dimensional, nonlinear. Gsim is able to more accurately expressess the similarity degree among high dimensional data. And a manifold distance based on similarity metric is able to show the intrinsic link of genes. So this paper takes advantage of Gsim and manifold distance and present a significant gene selection method based on two-step clustering. This method effectively solve the problem that resolution ratio decline among high dimensional and nonlinear gene data. We also improve forecast method's abilities of generalization.
Keywords/Search Tags:gene expression profiles, feature selection, clustering, minimal spanning tree, support vector machine
PDF Full Text Request
Related items