Font Size: a A A

Two New Computational Methods For The Detection Of Lateral Gene Transfer Events

Posted on:2009-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L LiuFull Text:PDF
GTID:1100360242994316Subject:Crop Science
Abstract/Summary:PDF Full Text Request
Since most of the genes within an organism arise by lineal descent and thus share the same theoretical phylogeny as other genes within the organism, a significant fraction do not be lineal descent and could be termed as "alien genes". Recognizing those genes acquired by horizontal transfer is necessary to reconstruct the evolutionary events that shape organisms and is useful in understanding the functional capabilities possessed by specific organisms today.On the basis of different foundations, we have developed two novel strategies (CGS, MGC) to predict gene sequences which may fall into this category, and we have applied them to the genomes of several cyanobacteria. The main features and important results obtained from the proposed strategies have been summarized as follows:1. For CGS (Core Gene Similarity), the algorithm rests on the observation that oligonucleotide frequencies vary markedly from genome to genome. Others have used oligonucleotide contrasts or the similar concept of codon usage, W8 in efforts to identify genes out of place in a genome, but these attempts have suffered from noise introduced by considering all oligonucleotides of a given class, most of which are not informative. To get around this problem, we have identified sets of oligonucleotides that are markedly underrepresented in genes likely to be native to the cyanobacterial lineage: those that have orthologs in 13 diverse cyanobacterial genomes. A series of simulations were conducted to cope with a situation and to test the efficacy of the proposed strategies. In simulation, when the significant level is less than 10%, the algorithm is always better at picking out artificial foreign genes in simulated transfer than CB. It is better than W8 when GC fraction of those genomes is quite distant from that of C. elegans (the source of the seeded genes), and is better than C+G when GC fraction of those genomes is quite close to that of C. elegans. When the GC fraction of foreign genes varies, W8 and C+G fluctuate greatly, but the CGS is stable and robust. Using Synechococcus WH8102 (S8102), we assessed the validity of the predictions by constructing phylogenetic trees of proteins and similar proteins in other organisms. In phylogenetic analysis of S8102, the CGS of G-test achieves very significant improvements over both W8 and CB, and slightly performs better as compared to C+G. Software written by BioLisp language is developed for data analysis.2. For MGC (Multiple Genomes Comparison), the algorithm rests on the hypothesis that the colinearity of bacterial chromosome genes is well conserved between the strains of the same species. Horizontal gene transfer events will lead to the integration of the alien genomic islands into these conserved genomic backbones. However, it is difficult to use the simple pair-wise alignment of two relative genomes to distinguish the loss and the gain due to the rearrangement or loss of chromosome. The MGC can avoid the false prediction that comes from the gene loss as much as possible, and thus improving the precision of prediction. When the algorithm is applied to the Prochlorococcus marinus MED4, our results confirmed earlier reports that tRNA serves as integration hotspot for alien genes and many genomic islands are flanked by direct repeat sequences. GIs are integrated into the host genome through recombination. Using a through search for homologue genes from virology database, we found the homologs of 42 alien genes of PMED4 in phages, most of which are not complete genome sequences but metagenomes. In 22 phylogenetic trees, the homologs of 7 alien genes in phages and viruses might come from cyanobacteria. The presence of these genes in the phages suggests that they play a functional role in the phages which gain some benefits from them.3. Of the two proposed methods, CGS uses the information of sequence characteristics to predict alien genes and MGC is utilizing gene position on a chromosome to predict alien genes. These two methods will be useful in evolutionary biology and genome studies for predicting alien genes of various organisms.
Keywords/Search Tags:Lateral gene transfer, horizontal gene transfer, cyanobacteria, ortholog, paralog, CGS (Core Gene Similarity), MGC (Multiple Genomes Comparison)
PDF Full Text Request
Related items