Transcription Factor Binding Site Identification And Analysis

Posted on:2011-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:G Dan

Full Text:PDF

GTID:2190330332983538

Subject:Bioinformatics

Abstract/Summary:

PDF Full Text Request

As the rapid development of the next-generation genome sequencing tech-nology, most bioinformatics studies has been also developed rapidly, including the TF regulation, microRNA, epigenetics, and de nono assembling as well as meta-genomics. The rapid development of this study requires efficient tools for further research.In our report, we developed some practical software, for instance smart-SEED and GSP, discussed the p-value calculation in the TFBS motif recognize problem. We applied some elegant mathematic model and algorithm, such as hid-den Markov model, the Bayesian estimation and the Expectation-minimization algorithm.The transcription factor binding site study is very important in the study of protein and DNA interaction. Most scientist try to study the exact motif recognition algorithm. In our research, we developed a motif recoginze algorithm based on the embeded hidden Markov model, and firstly introduce an effective algorithm on exact tuple counting for first and second moment calculation. They both could be used to find the promoter motif element in Arabidopsis. Further more, we developed the position specific weight matrix for TFBS identifying, base on the assumption that due to the space relationship between combining complex and the transcription start site, the distance distribution are specific in position.The further research indicate that in Arabidopsis TF genes are more complex modification rather than the non-TF genes.The developing of large scale sequencing technology, we could easily obtain the high-through sequencing data. We firstly develop a software, called GSP, based on the Expectation-maximization and Bayesian estimation, which could estimate the genome size under different sequence coverage. The software has elegant mathematic description and the result perform well. We first design the algorithm under the model without sequencing errors, then extended it to the situation of the error containing. We test it under variety data set, from prokaryotic to eukaryotic genomes.

Keywords/Search Tags:

Identification

PDF Full Text Request

Related items

1	Research And Application Of Multivariable System Identification And Modeling Technique
2	Iterative Identification Method For Two-input Output Error Bilinear-in-parameter System
3	Recursive Identification Methods For Multivariable Equation-error Systems Based On Decomposition
4	Study On The Identification Of Morinda Officinalis How And Its Six Kinds Of Close Relatives Plants
5	Lithology Identification Methods Of Glutinite
6	Digital Insect Identification Based On Support Vector Machine
7	Automatic Recognition Of Typical Natural Elements From GF-2 Remote Sensing Imagery
8	Research On Theory And Method Of System Identification Based On Low-dimensional Structural Constraints
9	High Accuracy Identification Methods For Industrial MPC Systems
10	Study On Real-time Extraction Of Modal Features Based On Stochastic Subspace Identification