Font Size: a A A

Discovery of transcription factor binding sites using computational statistics

Posted on:2003-06-16Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Liu, XiaoleFull Text:PDF
GTID:1460390011484234Subject:Biology
Abstract/Summary:
The rapid development of sequencing technology and high throughput technologies such as serial analysis of gene expression, DNA microarray, and chromatin immunoprecipitation followed by microarray experiments (ChIP-array) allows biologists to study gene expression and transcription regulation at a genome level. Given a set of DNA sequences clustered by gene expression profiling or enriched in ChIP-array, analysis is needed to find common sequence motifs responsible for transcription regulating. Motivated by such needs, I designed and implemented two computational statistics algorithms, BioProspector and MDscan.; BioProspector adopts an existing Gibbs sampling algorithm and adds many features to improve its sensitivity, specificity and flexibility. It can be used for motif finding from upstream sequences of genes clustered by expression profiling, and find motifs with two conserved blocks separated by a variable length gap. Once a correct motif matrix is defined, more sites in the genome can be identified using another program developed called MatrixScan. BioProspector was systematically validated on the M. xanthus annotated genome to search for σ54 binding sites, on the DBTBS database of B. subtilis promoters to search for transcription factor (TF) binding motifs, and on the upstream sequences of yeast expression mega-cluster to find TF motifs responsible for expression co-regulation.; MDscan is a novel and fast algorithm designed and implemented to find protein-DNA interaction motifs from ChIP-array experiments. By searching for TF motifs first from sequences with more potential TF sites and combining the advantage of word enumeration and matrix update approaches, MDscan is able to successfully identified the correct motif from all published ChIP-array experiments performed on yeast. It also shows speed and accuracy advantages compared to several established motif-finding algorithms in simulation studies. MDscan is also useful to identify combination of TF motifs responsible inducing or repressing gene expression from a single microarray experiment without clustering by combining motif search with microarray measurements.; Although finding transcription factor binding motif and binding sites is still a challenging problem, the success of BioProspector and MDscan gives us confidence that computational statistics algorithms like them will help solve this problem, and help biologists understand transcription regulation and genetic network.
Keywords/Search Tags:Transcription, Binding sites, Gene expression, Computational, TF motifs, Microarray
Related items