Font Size: a A A

Eukaryotic ortholog groups and comparative genomics: Applications to apicomplexan parasites

Posted on:2005-05-24Degree:Ph.DType:Dissertation
University:University of PennsylvaniaCandidate:Li, LiFull Text:PDF
GTID:1450390008490576Subject:Biology
Abstract/Summary:
With the progress of sequencing efforts on multiple taxa, comparative genomic approaches have come forth as a strategy for functional and evolutionary studies. The concepts of orthology and paralogy have recently been applied to functional characterization and classification at the scale of whole genome comparisons. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, multi-domain proteins and sequence information is often incomplete. The phylum Apicomplexa contains numerous protozoan parasites of medical and veterinary significance, including Plasmodium (the causative agent of malaria) and Toxoplasma (an important AIDS pathogen). Extensive genome and/or EST sequence information is available for many members of this group.; The major focus of this dissertation seeks to develop computational approaches that will exploit available sequence data for cross-species comparisons and data integration and apply them to sequence data from apicomplexan species, facilitating functional characterization, phylogenetic analyses, identification of drug targets, etc. To facilitate EST data analysis, a comparative database, ApiESTDB (http://www.cbi1.upenn.edu/paradbs-servlet/), was generated comprising EST assemblies from each species, along with automated annotations and information on EST sources. For whole genome comparisions, OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm (J. Molec. Biol. 314:1041--52; 2001) when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO (Genome Res. 12:493--502; 2002), but improved recognition of inparalogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously-assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome dataset from several publicly-available genomes, including two Plasmodium species. The results have been incorporated into the Plasmodium genome database, PlasmoDB, to identify genes that were incompletely annotated in first-pass annotation of the parasite genomes and putative therapeutic targets that have a restricted phylogenetic distribution.
Keywords/Search Tags:Comparative, EST, Genome, Eukaryotic, Multiple
Related items