Font Size: a A A

Analysis and design of genomic sequences

Posted on:2008-03-14Degree:Ph.DType:Thesis
University:State University of New York at Stony BrookCandidate:Papamichail, DimitrisFull Text:PDF
GTID:2443390005462500Subject:Biology
Abstract/Summary:
Genomic sequences contain genetic information for the development and functioning of living organisms. Sequence variability can be used both to determine organism identity and as a tool to alter function.;Although microorganisms dominate the biosphere, most have not been identified or studied. In this dissertation, we present an oligonucleotide (k-mer classification method based on conditional probabilities, which performs substantially better than other known methods and can be used to identify bacterial species, even from mixed populations, using modest amounts of sample sequence [96].;Here we also deal with the problem of population analysis, leading to determination of diversity and function of members of microbial communities [72]. We develop homology based tools for robust phylotype determination, enhancing closely related sequence associations, and a methodology for achieving more accurate richness estimation, using different clustering criteria [95].;The emerging field of synthetic biology is broadly defined as the intersection of biology and engineering that focuses on the modification or creation of novel biological systems that do not have a counterpart in nature. Working with the group that achieved the first genome-level synthesis of a virus, we have designed, synthesized, and evaluated new variants of poliovirus to serve as vaccines. Specifically, we sought weakened but viable strains that could be used for preparations of a killed poliovirus vaccine. Our designs result in a virus with roughly 100-fold lower specific infectivity than the wildtype virus. Here we detail the theory behind gene design in the context of optimizing a DNA sequence for particular desired properties while simultaneously coding for a given amino acid sequence [87].;We have also explored the problem of designing the provably shortest genomic sequence to encode a given set of genes by exploiting alternate reading frames. We have developed an algorithm for designing the shortest DNA sequence simultaneously encoding two given amino acid sequences. We have shown that the coding sequences of naturally occurring pairs of overlapping genes approach maximum compression, as well as investigated the impact of alternate coding matrices on overlapping sequence design [129].
Keywords/Search Tags:Sequence
Related items