Font Size: a A A

Improvement of ab initio methods of gene prediction in genomic and metagenomic sequences

Posted on:2011-02-27Degree:Ph.DType:Dissertation
University:Georgia Institute of TechnologyCandidate:Zhu, WenhanFull Text:PDF
GTID:1440390002465031Subject:Biology
Abstract/Summary:
A metagenome originated from a shotgun sequencing of a microbial community is a heterogeneous mixture of rather short sequences. A vast majority of microbial species in a given community (99%) are likely to be non-cultivable. Many protein-coding regions in a new metagenome are likely to code for barely detectable homologs of already known proteins. Therefore, an ab initio method that would accurately identify the new genes is a vitally important tool of metagenomic sequence analysis. The standard tools for ab initio prokaryotic gene prediction such as EasyGene, GeneMarkS or Glimmer were not designed to work with short sequence fragments from unknown genomes. However, a heuristic model method for finding genes in short prokaryotic sequences with anonymous origin was proposed in 1999 prior to the advent of metagenomics.;The idea was to bypass traditional ways of parameter estimation such as supervised training on a set of validated genes or unsupervised training on an anonymous sequence supposed to contain a large enough number of genes. It was proposed to use dependencies between the codon frequencies and the genome nucleotide composition. In this way, the codon frequencies, critical for the model parameterization, could be derived from frequencies of nucleotides observed in the short sequence.;With hundreds of new prokaryotic genomes available it is now possible to enhance the original approach and to utilize direct polynomial and logistic approximations of oligonucleotide frequencies. This method could be further applied for initializing the algorithms for iterative parameters estimation for prokaryotic as well as eukaryotic gene finders.;The research of this dissertation contributed to the following publications: (1) Zhu W., Lomsadze A. and Borodovsky M. (2010). ab initio Gene Identification in Metagenomic Sequences. Accepted, Nucleic Acids Research. (2) Martin J., Zhu W., Bergman N. and Borodovsky M. (2009). Assessment of Gene Annotation Accuracy by Inferring Transcripts from RNA-Seq. BIBM 2009: 54--59. (3) Martin J., Zhu W., Passalacqua K., Bergman N. and Borodovsky M. (2010). Bacillus anthracis genome organization in light of whole transcriptome sequencing. BMC Bioinformatics 2010, 11(Suppl 3):S10. (4) Zhu W., Lomsadze A. and Borodovsky M. GeneMarkS Plus: Improving gene annotation in complete prokaryotic genomes. In Preparation. (5) Bakkeren G., Zhu W., Antonov I. and Borodovsky M. Gene prediction in Puccinia triticina based on EST data. In Preparation.
Keywords/Search Tags:Gene, Ab initio, Sequence, Borodovsky, Zhu, Method, Metagenomic, Short
Related items