| [PURPOSE] Long non-coding RNA(lnc RNA)are the genes,whose transcription longer than 200 nt but not encode any protein.In the past decade,non-coding elements located in intergenic regions including lnc RNA were once considered as junk sequences occupying a large area in the genome.With the deepening of research,it has been found that 80% of the human genome can be transcribed,these non-coding regions can easily produce lnc RNAs.Although more and more lnc RNAs have been discovered and proved to have functions,our understanding of these genes is still relatively elementary due to their characteristics of rapid evolution,low conserved sequence,high species-specificity,and no obvious phenotype of some lncrnas.Except for a few lnc RNAs,such as XIST,which can affect the X-inactivation,the functions of most lnc RNAs remain to be elucidated.[METHOD] New genes may quickly build their network,exert important functions,and generate novel phenotypes,and lnc RNAs may also acquire their functions through similar ways.However,unlike protein-coding genes,lnc RNAs are suffered the pressure with weak selection and strong drift.Under such conditions,lnc RNA will acquire nondirectional redundancy function,play its role in specific environment and be purified and selected.Therefore,comparing the similarities and differences between the evolution of protein-coding genes and lnc RNAs will be an effective method to study the function of lnc RNAs.[RESULT] However,there are currently no comprehensive resources available for cross-species gene age inference for either protein-coding genes or lnc RNAs.Here,we systematically date the age of 9,102,113 protein-coding genes from 565 species in the Ensembl and Ensembl Genomes databases,including 82 bacteria,57 protists,134 fungi,58 plants,56 metazoa,and 178 vertebrates,using a protein-family-based pipeline with Wagner parsimony algorithm.We also integrated data from the Time Tree database to map the ages of genes from the origin clades to specific times in millions of years.Similarly,we collected 1862 RNA-seq samples from 18 species of Euteleostomi from our laboratory and public databases.Using Blast N + Ortho MCL,we performed homologous analysis and inferred gene families,then mapped them to Timetree.We also evaluated various factors that might influence the age inference of lnc RNA,including sequencing methods and homologous inference methods.The functional similarities and differences between lnc RNA and protein-coding genes were studied through their correlation between ages and various functional characteristics.By comparing the age and characteristics of lnc RNAs and protein-coding genes,I found that: 1)Lnc RNA sequences have become more complex during evolution;2)There are differences in expression patterns between old lnc RNAs and protein-coding genes;3)Older lnc RNAs have more complex transcriptional regulatory behaviors;4)Analysis of chromatin states indicates that lnc RNAs might have unique origins;5)Human-specific lnc RNAs with paralogs are more enriched for functional features.[CONCLUSION] These results suggest that lnc RNAs have a very unique way of evolution,and as functional features are enriched,lnc RNAs are more conserved,subject to stronger positive selection effects and alter surrounding chromatin states.Therefore,this study believes that lnc RNA should neither be classified as the degradation product of protein-coding genes,nor should it be classified as the intermediate product of proteincoding genes generated by de novo methods,nor the appendage of protein-coding genes,but should be a class of genes with important functions.All the protein coding gene age data are cataloged into Gen Origin,a user-friendly new database of gene age estimates,where users can browse gene age estimates by species,age,and gene ontology(website: http://genorigin.chenzxlab.cn/).In Gen Origin,the information such as gene age estimates,annotation,gene ontology,ortholog,and paralog,as well as detailed gene presence/absence views for gene age inference based on the species tree with evolutionary timescale,is provided to researchers for exploring gene functions. |