Font Size: a A A

Research And Implementation On The Prediction Of Transcription Factor Binding Site Based On Gibbs Sampling Algorithm

Posted on:2013-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:B L LiFull Text:PDF
GTID:2248330371985367Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Bioinformatics is a new cross-discipline, biology is more concerned about the gene regulatorynetworks. Gene regulatory networks is the most important part of the regulation of transcription,changes in the structure of gene regulatory networks of cell differentiation and tumor formationsuch as the underlying causes of biological phenomena. Cell differentiation mainly through changesin gene expression rather than changes in gene sequences to achieve, is the structural geneexpression in vivo transcription, translation, and all processes. Open and close any effect on genetranscription and translation rate of the direct factors, collectively referred to as the regulation ofgene expression. Gene transcription initiation is the key to the efficiency of gene transcriptionbinding proteins known as transcription factors. More difficult to directly study the transcriptionfactors, DNA binding sites to take advantage of the relatively easy-to-computer processing ofinformation in regulation of gene transcription.DNA transcription factor binding sites tend to be more conservative, these conserved DNAbinding site is known as regulatory elements. Gene regulatory elements are usually identified usinga simplified level of gene transcription regulation model, ignoring the long-range effect. This levelof gene transcription regulatory elements of the identification problem is transformed from a groupof co-regulated genes are known to the promoter region upstream to find a common motif.Predicted transcription factor binding sites is an effective way to control gene promoter regionwere identified statistically significant over-expression motif. Such methods the model contains twosmall models, one motif model, a is the background model.Two of the more important model approach, position-specific frequency matrix used to modeland predict the promoter region of regulated genes were common motifs, one is expected algorithmand the other is the Markov chain-Monte Carlo Theory The Gibbs Sampling algorithm.Markov process then the process is just a class, while the Markov chain is a typical Markovprocess, which in the natural sciences, engineering and public utilities in a wide range ofapplications; Monte Carlo (Monte Carlo) methods, Stochastic simulation is a computer, also known as random sampling or statistical testing methods, can deal with non-linear, non-normal problems.Monte Carlo method is widely used to solve scientific and engineering aspects of economic andfinancial issues, because it can realistically simulate the actual physical process, it is very much inline with the actual problem-solving, can be a very successful conclusion.Markov chain-Monte Carlo theory (Markov chain Monte Carlo: MCMC), is a dynamicMonte Carlo method, it is the advent of the stochastic simulation in many areas of computing,shows its great advantage, widely used in Bayesian inference and machine learning.This paper mainly focuses on the prediction algorithm based on Markov chain Monte Carlo(MCMC) Theory for transcription factor binding site prediction and the applicationfor its very related softwares. Detailed content includes: research on the MCMC theory, theconcrete theory and application of its algorithm,Gibbs sampling. Gibbs Sampling algorithm is aspecial Markov chain-Monte Carlo method, the present, Gibbs Sampling algorithm and someimproved algorithms are widely used in identification of regulatory elements. In the Linuxoperating system through the use of Qt (a very good language based on C++cross-platformgraphical user interface development tools) on the transcription factor binding site prediction kit-BEST applied and analyzed. The software is today a very good international transcription factorbinding site prediction package, she integrates some commonly used gene expression data miningprocess, the paper focuses on integration in the BEST of a representative of the data miningsubroutine: Life Explorer (BioProspector). Theory of MCMC Gibbs sampling algorithm importantanalysis of the mathematical model is a computer algorithms to achieve these key issues socommitted to a specific mathematical model of Gibbs sampling analysis and research. Andapplication of such models for the start of the MCMC exploration of theory and practice, the betterfor the future creation of other software to lay a solid foundation in theory and practice.MCMC Thoery has greatly revolutionized the previous statistical method which wasconsidered impossible to calculate. Now, many bioinformatic questions can be solved by MCMC.At present, many scholars abroad have done some research and proposed ideas on MCMC based onthe Bayes theory framework, however, those which could really be applied on computer is rathersmall in number, with only a small number of alpplication softwares like BioProspector. Comparedwith them, china is still in infancy in this field and the concrete application of MCMC is even lessconcerned. In the domestic Bioinformation field, the research and application of MCMC remains untouched and that is just the significance of the research.
Keywords/Search Tags:Random process, Markov chain Monte Carlo, Gibbs sampling, Mathematicalmodel, Palindrome
PDF Full Text Request
Related items