Font Size: a A A

Studies Of Filteration Of Distinguishing Expression Genes Based On Clustering Arithmetic Of Extend CF-tree

Posted on:2008-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:F XiongFull Text:PDF
GTID:2120360215985879Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the accomplishment of Human Genome Project, the biological research comes to the new post-genome era. Scientists now focus on exploring genome structures and functions from biological data. Serial analysis of gene expression (SAGE), DNA micro-array and gene chip technology have now made it possible to simultaneously monitor the expression levels of thousands of genes during biological processes, And serial analysis of gene expression (SAGE) has become a very important branch of bioinformatics research. How to use the analysis technologies of computer science to analysis the millions data and discover the useful and instructive knowledge of biological experiment is attracting more and more attentions for the information biology.Clustering analysis is the frequent method to analysis the gene expression data. Genes with similar expression patterns can be clustered together with similar functions, all of them have the close biology function.Through massive analyses and research, we find the two of BIRCH-clustering arithmetic based on CF-tree have their shortcomings each, one uses the same threshold to shape multi-cluster, and the other can't find anomalous cluster. This paper presents multi-representative points algorithm base on the feature tree, the algorithm based on the idea of BIRCH algorithm, add advantage of the CURE algorithm, it can compress massive clustering data, and can capture the complex shapes of the clusters. Use the data structure, and use random sampling method, we advance a suitable data handling clustering algorithm, The algorithm can satisfy the above clustering algorithm, and can process data rapidly and effective mass. At the same time, we analysis the improved algorithm from both of the quantitative and qualitative.Meanwhile, the article also introduce the expanded software systems base on our CF-tree clustering, and run the example of gastric cancer's SAGE Datebase, benefit the effective and fast tool, difference expression genes of gastric cancer were distinguished, which will guide our further molecular biology research. If validated by molecular biology experiment, these difference expression genes will be used as molecular targets of gastric carcinoma. Some novel gastric carcinoma associated genes will also be cloned based on these difference expression of ESTs by further bioinformatics analysis and molecular biology experiment.
Keywords/Search Tags:bioinformatics, serial analysis of gene expression, clustering arithmetic, extend CF-tree
PDF Full Text Request
Related items