Font Size: a A A

Tissue Specific Pattern Discovey For Promoter Sequence Of Human Gene

Posted on:2013-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:F F ZhaoFull Text:PDF
GTID:2210330362960726Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Identification and analysis of tissue-specific (TS) genes and their regulatoryactivities play an important role in understanding mechanisms of the organism,disease diagnosis and drug design. However, disclosing the mechanisms underlyingregulation of tissue-specific gene expression by using computing technology and datamining methods remains a challenging question. Sequence characteristics of a gene'spromoter region have closely relationship with cells'general and tissue specificity.In this paper, we designed and developed algorithms for discovering SSR (SimpleSequence Repeat) patterns and statistical significance patterns which can be used tostudy the relationship between patterns in the promoter of human genes and thetissue-specificity expression of these genes.SSR (Simple Sequence Repeat) is the tandem repeat sequence in DNA sequence,this paper design a SSR patterns discovery method which first gives SSR a formaldefinition, then discoveries SSR patterns which are closely related with tissuespecificity on promoter region of gene by using a heuristic algorithm, finally, statisticsand analyzes the position frequency of these SSR. We performed the analysis on thesequences of promoter regions (-1000bp~+499bp) of 4,552 human tissue-specificgenes across 82 tissues and 924 housekeeping genes, finally we got SSR patternswhich have closely tissue specificity with 82 human tissues and we used kidney andtestis tissues as an example to show part of the experiment results.Statistical significance patterns refer to periods of nucleotide sequence whichoccur frequently in the non-coding region of gene sequence. In order to discover thesepatterns, we design a method which consists of three phases: motif searching, motifmerging and motif validation. The motif searching phase integrates three algorithms:MEME, AlignACE and Gibbs Sampling. In the second phase, we propose a motifmerging method, which bases on background distribution of Nucleotide to reduceredundancies of motifs from the first phase. Lastly, the motif validation phase verifiesthe statistical significance of discovered motifs using a Bayesian Hypothesis Testapproach. We implement the statistical significance patterns discovery method on thesame input data, the experiment obtained 1,618 patterns which are from tissue-specific genes, and 3 patterns which are from Housekeeping genes. Among alldiscovered patterns, some are previously known; some are new patterns with strongstatistical significance in specific tissues that are previously unknown which need tobe further verified. By analyzing the distributions of motifs in different promoterregions, this paper found that the density of SSR patterns is significantly greater thanthe density in other regions of promote, while the statistically significant patterns havethe maximum frequency in proximal promoters region. After analyzing theexperimental results, this paper inferred these two kinds of patterns are different inregulation positions and regulation styles.The study of the relationship which involved SSR and statistically significantpatterns with tissue specificity provides support for understanding the inner regulationmechanism of tissue specificity from the structural characteristics of promotersequences.
Keywords/Search Tags:Tissue Specific Gene, Promoter Region, Pattern Discovery, PatternMerge, Bayesian Hypothesis Test, SSR Pattern
PDF Full Text Request
Related items