Font Size: a A A

Transcription Factor Binding Site Identification And Analysis

Posted on:2011-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:G DanFull Text:PDF
GTID:2190330332983538Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
As the rapid development of the next-generation genome sequencing tech-nology, most bioinformatics studies has been also developed rapidly, including the TF regulation, microRNA, epigenetics, and de nono assembling as well as meta-genomics. The rapid development of this study requires efficient tools for further research.In our report, we developed some practical software, for instance smart-SEED and GSP, discussed the p-value calculation in the TFBS motif recognize problem. We applied some elegant mathematic model and algorithm, such as hid-den Markov model, the Bayesian estimation and the Expectation-minimization algorithm.The transcription factor binding site study is very important in the study of protein and DNA interaction. Most scientist try to study the exact motif recognition algorithm. In our research, we developed a motif recoginze algorithm based on the embeded hidden Markov model, and firstly introduce an effective algorithm on exact tuple counting for first and second moment calculation. They both could be used to find the promoter motif element in Arabidopsis. Further more, we developed the position specific weight matrix for TFBS identifying, base on the assumption that due to the space relationship between combining complex and the transcription start site, the distance distribution are specific in position.The further research indicate that in Arabidopsis TF genes are more complex modification rather than the non-TF genes.The developing of large scale sequencing technology, we could easily obtain the high-through sequencing data. We firstly develop a software, called GSP, based on the Expectation-maximization and Bayesian estimation, which could estimate the genome size under different sequence coverage. The software has elegant mathematic description and the result perform well. We first design the algorithm under the model without sequencing errors, then extended it to the situation of the error containing. We test it under variety data set, from prokaryotic to eukaryotic genomes.
Keywords/Search Tags:Identification
PDF Full Text Request
Related items