Font Size: a A A

Identification And Re-annotation Of New Gene During Evolution

Posted on:2015-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ChenFull Text:PDF
GTID:2180330434466059Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Since the first new gene jingwei was discovered in1993, how these lineage-or species-specific new genes got originated during evolution is attracting increasing interest. Quite a few new genes origination mechanisms have been characterized including DNA-level duplication, RNA-level duplication or retrotransposition, de novo origination, lateral gene transfer and so on.In order to perform mechanistic studies, new genes must be identified first. By definition, new genes emerged in recent evolution and thus show limited phylogenetic distribution. Thus, methods have been developed to date when gene emerged by analyzing the distribution of orthologs along a phylogenetic tree. By following this strategy, we implemented two methods to annotate gene ages. The first is traditional method which based on phylogenetic analysis for single gene. We used this method to study and infer two essential plant genes get originated since they may be involved in the adaptation from aquatic environment to terrestrial environment. The second mothod is based on the former method and whole genome syntenic alignments to perform genome-wide age dating. Based on gene annotation provided by Ensembl database, we dated all annotated genes from six species, including human, mouse and so on. In human and mouse, based gene annotation provided by Refseq, Ensembl and UCSC database, we further dated noncoding gene models including long noncoding RNAs.After we dated genes, we noted that the number of lineage-or species-specific genes is quite different across different Ensembl versions. Such a contrast indicates that Ensembl annotation is not stable or reliable for recently evolved new genes. One problem is that Ensembl could not differentiate newly duplicated protein-coding genes and pseudogenes, both of which are often generated by recent DNA-or RNA-level duplication. In order to address this issue in human, we used peptide data from multiple mass spectrometry databases such as peptideAtlas, ProteomicsDBand Proteome Map to validate whether Ensembl pseudogenes are truly non-translated. We found116pseudogenes, covered by at least one uniquely-mapping peptide with more than60of which are primate-specific genes. Such a result shows that an appreciable proportion of new protein-coding genes are misannotated as pseudogenes in the mainstream annotation practice like Ensembl.Besides the mechanistic studies of new gene origination, its significance in phenotypic evolution is also getting more and more recognition given its functional versatility. We are actively developing an online database (http://gentree.ioz.ac.cn/), which will present all data generated by our efforts in new gene identification and reannotation. In the long run, this database may grow as a community resource for both evolutionary and functional studies and further help the expansion of new gene field.
Keywords/Search Tags:new gene, age annotation, gene tree, pseudogene, long non codingRNA, coding gene
PDF Full Text Request
Related items