Font Size: a A A

Use Of Genetic Algorithms To Improve The Hash Tree-based Association Rule Mining Co-regulated Genes

Posted on:2008-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:F J HanFull Text:PDF
GTID:2190360215450158Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In the 20th century, biology technology has improved dramatically. Scientists has large amount of gene data as the human genome sequence plan has finished, which brought a broad researching area. Scientists on computing, mathematics and biology are all devoted into decoding the mystery of bioinformatics, within which contracts lots of researchers to construct the gene network.It has proved that, all types of cells in a biological entry have the same genome. The variance of the gene expression but not the change of gene makes the cells to have diverged functions. Gene expression is a process including transcription, translation and processing. The regulation of gene expression refers to any direct factors that affect the expression of genes and the speed of transcription and translation. Co-regulated genes are the ones that have the same functions, which are the groundwork of constructing the gene network; However in computer processing, it takes the ones that share the same transcription factors as co-regulated genes instead. Recent researches are concentrating on mining the co-expression genes instead of co-regulated genes, but related works show that they have many differences from each other, and can not replace each other directly.In this paper, we studied the gene microarray data, and proposed a new method to mine co-regulated genes according to their property and biological mechanics. The main work concentrated on the following areas: first, simulating the positive and negative co-regulated gene clusters (PNCGC) algorithm to mine co-regulated genes; Second, improving the association rules to mine both the negative rules and positive rules, and using the hash-tree to save and search the frequent itemsets to improve the time and space efficiency; Third, combining genetic algorithms with association rules to generate rules from frequent itemsets, and getting the rules that have at least one item in the left part seperately. We applied this method to yeast and Arabidopsis thaliana data sets to generate lots of valuable data successfully. Comparing with the SGD and TAIR database to find the transcription factor (TF) and transcription factor binding site (TFBS), it states that most genes in a rule are co-regulated by the same TFs, and they have the similar TFBS in the upstream. So it is reasonable to say that the rules are co-regulated indeed. Simulation results show that this method has its unique advantages in mining co-regulated genes, and it should be a valuable explore.
Keywords/Search Tags:co-regulated genes, association rules, hash-tree, genetic algorithms, data mining
PDF Full Text Request
Related items