Font Size: a A A

Computing RNA coding spaces and efficient combinatorial library construction

Posted on:2002-11-05Degree:Ph.DType:Thesis
University:State University of New York at Stony BrookCandidate:Cohen, BarryFull Text:PDF
GTID:2460390011993968Subject:Computer Science
Abstract/Summary:PDF Full Text Request
The main classes of biomolecules---nucleic acids and proteins---are linear chains, or sequences, whose elements are drawn from small sets of chemical building blocks. They are accurately modelled as strings over a finite alphabet. The imposition of constraints on the composition of these sequences, as frequently occurs both in natural evolution and laboratory manipulation, gives rise to combinatorial and optimization problems which can fruitfully be addressed by computational means.; This dissertation examines three such problems: (1) Chapter 1 examines a problem in combinatorial chemistry, which is a technique for the systematic synthesis and screening of large libraries of small organic molecules. In drug development, such libraries are designed to various standards, such as effectively sampling a given space of compounds. It is then desirable to be able to efficiently synthesize the library. We propose a new algorithmicly-intensive optimized split-synthesis technology for fabricating combinatorial chemistry libraries and evaluate its potential through extensive simulation. (2) Chapter 2 develops evidence that there is natural selection on a physical property, the stability, of messenger RNA (mRNA) sequences in microbes. We fix the target protein, which constrains the coding sequence, and the probability of use of each codon. Stability is defined as the strength of pairwise bonds (secondary structure) between sequence elements. The technique is a statistical comparison between computer simulations of natural and synthetic mRNAs. It shows that among a large sample of such sequences (more than 27,000 sequences from 36 genomic structures in 34 species), a disproportionate number of natural sequences have very high or very low stability. (3) Chapter 3 examines the algorithmic problem of designing very stable or very unstable mRNA sequences which code for a target protein. We derive a dynamic programming solution to the most-stable-sequence problem (MSSP) which is polynomial, and asymptotically no more complex than secondary structure prediction. We show that the least-stable-sequence problem (LSSP) in NP-complete, and develop and two heuristics for the construction of such sequences. In both cases, the solutions are implemented and tested on real-world data.
Keywords/Search Tags:Sequences, Combinatorial
PDF Full Text Request
Related items