Font Size: a A A

Research On Feature Gene Selection Algorithm Based On Partition Clustering

Posted on:2011-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2120360308469131Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification for gene expression data is an important research filed in bioinformatics. Gene chip can massively detect the expression of thousands of genes in one experiment, and it has a very important practical significance for tumor classification and diagnosis. However, the gene chip has many characteristics such as high-throughput, high dimensional, nonlinear, high noise and uneven distribution, that makes it difficult to be processed. It is difficult to find the amount of feature genes, which have classification capability and minimum redundancy from the gene expression profile, and also it plays a key role in cancer diagnosis and research of pathogenesis mechanism. In this paper, the characteristics of gene selection methods are validated using the leukemia data set, the main research work are as follows:1. A feature gene selection method based on geodesic distance is proposed. As the gene expression data owns the characteristics of nonlinear and high noise, normal Euclidean distance can not represent the similarity measurement between genes. While, geodesic distance is a kind of Manifold distance which is better to show the intrinsic link of genes. Based on geodesic distance matrix, we improve the k-medoids method to select feature gene subsets. Then, SVM is used here for predicting the classification accuracy performance of the gene subsets. The experiment results show that the feature gene selection method based on geodesic distance has better performance than that based on Euclidean distance.2. A feature gene selection method based on Locally Linear Embedding is proposed. As the characteristics of nonlinear, high noise, and nonlinear of gene expression, traditional clustering method can not cluster the genes distinctly. So, through the Locally Linear Embedding method, gene vector space is mapped into a low-dimensional space. This method not only reduces the vector space of genes, but also shows the intrinsically relation between genes. Finally, we carry out experiments on gene expression profiles of leukemia. Through the comparison with other papers, the results show that our method is Feasible and effective.
Keywords/Search Tags:Gene chip, Feature selection, Clustering, Geodesic distance, Locally linear embedding
PDF Full Text Request
Related items