Font Size: a A A

Gene Expression Clustering Analysis Method

Posted on:2002-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiuFull Text:PDF
GTID:2204360032955245Subject:Medical statistics
Abstract/Summary:PDF Full Text Request
Motivation: Microarrays are one of the latest breakthroughs in experimental molecular biology, which allow monitoring of gene expression for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. However, gene expression data analysis are currently only in their infancy, there are many difficulties to be conquered: he selection of clustering algorithms and its suitable parameters for a given clustering problem. alidating the clustering result. There are many clustering algorithms have been applied to gene expression data now, and new algorithms are proposed continuously. So, select a suitable algorithm for a special problem is not easy thing. The current clustering algorithms do a good job when run on appropriate data with the appropriate parameters. So, one of the major challenges in using a algorithm on a specific problem lies not in performing the clustering itself, but rather in choosing appropriate parameters of algorithms. Additionally, the current approaches often rely on non-intuitive parameters which may be difficult for even the informed user to select. The motivation of this paper is trying to dissolve the two problems above. Method: ?Fuzzy c-means method is one of the most welcomed algorithms which have been used in clustering gene expression data, but the parameter c must be selected by manual. We constructed a discriminant-PFS to auto identified parameter c. s for the problem of validating clustering result, we developed a systematic internal and external framework for assessing the results of clustering algorithms. We call it Entropy validating method. Result: e applied PFS Fuzzy clustering method to some varied dimensions simulated data for testifng, and it produced satisfied results. And then we applied it to a real gene expression data set--Human fibroblasts to Serum data set. The genes functional category of this data set is already known. By comparing the clustering result to the genes functional category, we found that the functional category and the clustering result of PFS Fuzzy algorithm are highly related. By this way we validated PFS Fuzzy clustering method. @~ We successfully applied Entropy validating method to six clustering algorithms(SOM method, Fuzzy clustering method, K-mean clustering method and three hierarchical clustering methods) on two gene expression data sets. We found our method to be very powerful and convenient to choose the clustering method. Conclusion: DPFS Fuzzy clustering is a high predictive clustering method, and it is suitable for clustering gene expression data sets. 〦ntropy validating method is based on the internal and external information of clusters to validate the algorithms. According to the validating of Entropy: SOM and Fuzzy clustering methods are most suitable for clustering gene expression data sets.
Keywords/Search Tags:Gene chip, Gene expression data, Clustering algorithm, Fuzzy clustering, SOM, Validating, Entropy
PDF Full Text Request
Related items