Font Size: a A A

Combinatorial Investigation Of RNA Pseudoknot Structures

Posted on:2011-08-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y JinFull Text:PDF
GTID:1100330332472473Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
This thesis mainly focuses on studying the combinatorics of helical structures of RNA sequences with the aim to further improve the prediction algorithms. We use the diagram representation for RNA structures and propose a novel approach to categorize RNA pseudoknots structures according to the maximal number of mutually crossing arcs. To be precise, we considerκ-noncrossing structures, i.e. structures that do not contain a k-set of mutually intersecting arcs, which subsumes RNA secondary structures by settingκ=2. Intuitively, a higher number of pairwise crossing arcs represents higher structural complexity.Based on the this classification of RNA structures, of particular interests in the RNAs with locally small number of crossings we start the combinatorial study ofκ-noncrossing RNAs by estimating the size ofκ-noncrossing RNA space and analyzing the distribution of RNAs with specific properties. In sharp con-trast to RNA secondary structures, the pseudoknots structures are not genuinely reducible with the crossing base pairs, for which it is a vain attempt to interpret the pseudoknot structures into tree configurations. Chen et al. proved the bijec-tion betweenκ-noncrossing diagrams and the random walks in (k-1)-dimension Weyl chamber. It is natural to extend this bijection into the class ofκ-noncrossing RNA structures with arc-length, stack-size restrictions.However, we notice that the translations of arc-length, stack-size into the ran-dom walks with forbidden paired moves within (κ-1)-dimensional Weyl cham-ber turn to be increasingly complicated for longer arc-length or larger stack-size and the inherent asymmetric property makes the enumeration of random walks in Weyl chambers by reflection principle no longer possible for the class ofκ-noncrossing RNA structures. We instead choose to relate the family ofκ-noncrossing RNA structures with theκ-noncrossing matchings via functional identity and integrate the arc-length, stack-size conditions into bivariate gener-ating functions, which allow us to initiate the singularity analysis and further study the limit distribution ofκ-noncrossing structures. The singularity analysis on the bivariate generating function ofκ-noncrossing RNA structures is not a trivial task. We indeed need to prove the analytic continuation of the bivari- ate generating functions in a simply connected region containing the dominant singularity and find its singular expansion to apply the transfer theorem within the regions enclosed by Hankel contour, which indicates the error bounds for its singular expansion can be shifted into the error bounds from the estimation of its coefficients. Finally, we correlate our combinatorial investigation with the pseudoknot minimal free energy folding algorithm cross to manifest the intrinsic irreducibility of pseudoknot RNA irrespective of the energy parameters.The combinatorial study of RNA secondary structures showed us the space of RNA secondary structures grows with rate n-3/2·1.84n, which allowed the dynamic programming routine to search the whole space. The combinatorial investigation of pseudoknot structures aims to scale down the whole pseudoknot space to certain subclass whose growth rate are quite close to secondary structures and consequently we expect the polynomial algorithm may exist for certain type of pseudoknot structures and that is exactly how the folding algorithm cross arises from. Furthermore, the prediction algorithm cross and the underlying combinatorial investigations are complete to each other. Case in point, the folding algorithm cross produces RNA structures with highly irreducibility, by which we consider the irreducible RNAs and reveal that the average number of irreducible RNAs is 1, purely from structure side.The thesis is organized as follows:In Chapter 1 we introduce basic bio-background of RNA molecules and discuss the advantages of our novel classifica-tion on the maximal number of crossing base pairings. In Chapter 2 we focus on asymptotically enumeratingκ-noncrossing RNA structures with arc-length con-straints by applying the singularity analysis on the supercritical paradigm for the 3-noncrossing RNAs with minimal arc-length and our main results show that the size of 3-noncrossing RNA structures of length n with minimal arc-length 2 grows with rate~n-5·(5+(?)/2)n. Furthermore, we apply Levy-Cramer Theorem to prove that the limit distributions for the number of arcs of 3-noncrossing struc-tures are highly concentrated towards its mean as k increases, i.e., the maximal number of allowed arcs, via transfer theorem on bivariate generating function. Our framework is not limited to the 3-noncrossing RNAs but for the family of arbitraryκ-noncrossing RNA structures. The next step, in Chapter 3, we study the k-noncrossing structures with stack-size restraint to further scale down the size of the subclass of pseudoknots structures, in which case we interpret the k-noncrossing RNA structures as the inflation of core structures arid integrate the effects of stack-size on the minimal arc-length into different core structures. Finally, in Chapter 4 we investigate the irreducibility of RNA pseudoknot struc-tures by extending the basic study on mean, variance into any moments and the whole limit distribution. This approach also solves the limit distribution for t-Dyck paths class and exceeds previous work by simplifying the computation for any higher moments.
Keywords/Search Tags:κ-noncrossing RNA structures, Weyl chamber, bivariate generating function, singularity analysis, analytic continuation, singular expansion, Hankel contour, transfer theorem, Lévy-Cramér Theorem, limit distribution
PDF Full Text Request
Related items