Font Size: a A A

Combinatorics And Folding Of Canonical RNA Pseudoknot Structures

Posted on:2011-01-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:G MaFull Text:PDF
GTID:1100330332972774Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
RNA plays an important role in the synthesis of protein. RNA includes transfer RNA (tRNA), which carry and transfer activated amino acids, messenger RNA (mRNA), which is the template of protein synthesis, and ribosome RNA (rRNA), which is the main place of cellular protein synthesis. RNA structures only be-come active when they have three dimensional structure, so the research of RNA structures becomes very important.In 1978 Michael Waterman pioneered the combinatorics and prediction of the RNA secondary structures. On the one hand, an RNA molecule is described by its primary sequence, a linear string composed by the nucleotides A, G, U and C. On the other hand, RNA does fold into tertiary-structures like DNA. An increasing number of experimental findings, as well as results from comparative sequence analysis imply that there exist additional, cross serial types of inter-actions between RNA nucleotides [46]. These cross-serial interactions between RNA nucleotides are called pseudoknots.In Chapter 1, we give some background knowledge. Firstly, we introduce RNA secondary structure. We introduce the definition, the representation and the recursion of secondary structures subsequently; The recursion function is very important because many results of RNA secondary structure are based on it. Secondly, we introduce pseudoknot RNA structures. We first introduce the definition ofκ-noncrossing and then list many combinatorial results about k-noncrossing RNA structures. At last, we introduce some basic knowledge about folding RNA secondary structures. The object of the folding algorithm is gener-ating the minimum free energy structure from a sequence.In Chapter 2, we give the generating function ofκ-noncrossingσ-canonical RNA pseudoknot structures with minimum arc-length≥4, whereσ≥3. Via the generating function and the knowledge of singularity analysis, we give the exponential growth rates of <κ,4,σ>-structures, whereσ≥3. For getting the generating function of <κ,4,σ>-structures, we introduce a special core, which is a bridge between <κ,4,σ>-structures andκ-noncrossing matching. Because all the generating functions are D-finite, we can use the knowledge of singularity analysis to get the exponential growth rates of <κ,4,σ>-structures.In Chapter 3, we study the statistical properties ofκ-noncrossing RNA struc-tures with minimum arc-lengthλ≥4 and minimum stack-lengthτ≥3. We proved the central limit theorem for the distributions of the arc-number of <κ,4,τ>-structures whereτ≥3. Firstly, we build a connection between the bivari-ate generating function of <κ,4,τ>-estructures and the generating function ofκ-noncrossing matchings. Secondly, we study the singularity of a specific parameter-izations of the bivariate generating function and show that their singularity con-trol the limit distribution. At last, we prove the central limit theorem for the limit distribution of the random variables Xn, with IP(Xn=h)=Tκ,τ(n, h)/Tκ,τ(n).In Chapter 4, we provide a generalization of cross [16]. In our algorithm, given a sequence, we can generate the minimum free energy (mfe) structure of all the <κ,4,σ>-structures. The algorithm has three phases. In phaseⅠ, we will generate all the irreducible shadows. In phaseⅡ, for every irreducible shadows, we get a skeleta-tree. In phaseⅢ, for every skeleton generated in phaseⅡ, we will inductively "fill" the remaining intervals of the skeleton with specific sub-structures. Basically, all routines employed in phaseⅢfollow the DP-paradigm. In phaseⅡof our implementation, when inserting new stacks, some rules have to be obeyed so that we can get every structure uniquely, see Section 4.4 for detail. We also notice that all <κ,4,σ>-structure has a unique loop decomposition which are very important for the phaseⅢof our implementation, see Section 4.5 for detail.
Keywords/Search Tags:RNA secondary structure, pseudoknot, k-noncrossing, generating function, D-finite, singularity analysis, central limit theorem, bivariate generating function, normal distribution, folding, minimum free energy, dynamic program-ming, skeleta-tree
PDF Full Text Request
Related items