Font Size: a A A

Whole-genome Reannotation Based On 15 Mouse Tissue Transcriptomes

Posted on:2016-07-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y CengFull Text:PDF
GTID:1220330488475737Subject:Genomics
Abstract/Summary:PDF Full Text Request
With the rapid development of the second generation sequencing, massively parallel cDNA sequencing (RNA-seq) has emerged to be the most powerful tool for transcriptome research, due to its single-base level accuracy and high-throughput speciality, RNA-seq can detect larger number of transcripts, especially for those transcripts with low expression level or only express in a few tissues. Mouse (Mus musculus) is one of the most widely used model organism in biology research and it has close genetic relationships with human, sharing large percent of genes and genetic materials. Therefore, we devoted to re-annotate the mouse whole-genome on the basis of 15 mouse important tissues, including the modification of original annotation, identification of non-coding genes and redefinition of House-keeping genes (HK genes) and Tissue-specific genes (TS genes).Non-coding RNAs (ncRNAs) are transcribed from DNA but do not code for any protein product, different kinds of ncRNAs have different gene length and biology functions. Recently, numbers of ncRNAs, especially long intergenic ncRNAs (lincRNAs), have been widely identified and well characterized as important regulator of diverse biological processes. In this study, we used a series of softwares to build a pipeline for ncRNA identification and analysis, and modified 8,040 annotation genes from NCBI, Ensembl and UCSC. Eventually, we obtained 16,249 ncRNA genes based on ultra-deep RNA-seq data from 15 mouse tissues,2,024 of which are intronic lncRNAs (ilncRNA). We annotated all defined ncRNAs by diverse properties and found ncRNAs are generally shorter, have fewer exons, express in lower level and are more strikingly tissue-specific compared with protein-coding genes. Moreover, these ncRNAs show significant enrichment with transcriptional initiation and elongation signals including histone modifications (H3K4me3, H3K27me3 and H3K36me3), RNAPII binding sites and CAGE tags, and we also found the co-expression between ilncRNAs and their host genes and their neighboring protein-coding genes as well. The Gene Set Enrichment Analysis (GESA) revealed several sets of ilncRNAs associated with diverse biological processes such as structural constituent of ribosome, negative regulation of translation, immune effector process, tissue development etc., and we still need to use experimental methods to validate the functions of some important lincRNAs in follow-up studies.HK genes describes genes ubiquitously and stably expressed in almost all tissues/cell-types regardless of its developmental stage, physiological condition and external stimuli, which are considered essential for the maintenance of fundamental cellular functions. Therefore, several highly and constantly expression HK genes (Gapdh, Actb and Ubc) are frequently used as internal controls in experimental testing. While TS genes represented those genes which are expressed in a single specific tissue, and thus can be used as drug targets and disease markers (LRRC4 and TNNC1). Here, we used Cuffdiff to quantify the expression of 23,374 annotation genes from UCSC RefGene, and consequently assigned 8,408 HK genes and 2,581 TS genes, respectively, and 6,778 HK genes were defined as human HK genes as well. Using CV model to assess the expression stability of all HK genes, we only abtained 143 HK genes stably expressed in all tissues. The tissue hierarchical clustering analysis based on JSD measurement showed that tissues tend to cluster together in a manner corresponding to their belonging systems and germ layers, which indicated the relationship between gene expression pattern and physiological function of each tissue. By performing functional and pathway enrichment analysis with DAVID and Ingenuity Pathway Analysis (IPA), we found that HK genes are significantly enriched in 292 pathways involving in almost all fundamental cellular processes, while TS genes are mainly related to their tissue-corresponding biological processes such as spermatogenesis (Testis-specific genes) and synaptic transmission (Cerebrum-specific genes), which is highly consistent with previous studies. Moreover, we used real-time RT-PCR method to test 18 candidate HK genes and finally obtained a novel list of highly stable internal control genes: Grcc10, Ddb1, Ywhae, Eif4h and Gpatch3, and the combination of any 3 genes can be used for further experimental calibrations.In summary, this study gave a novel list of mouse non-coding genes, HK genes and TS genes, which enriched the annotation information of mouse genome, and also provided a new opportunity for further genetic and evolution research.
Keywords/Search Tags:RNA-seq, ncRNAs, HK genes, TS genes, real-time RT-PCR
PDF Full Text Request
Related items