Font Size: a A A

Identification Of Rice Subspecies-Specific Protein Coding Genes And LncRNA Based On Large Scale RNA-seq Dataset

Posted on:2018-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y AnFull Text:PDF
GTID:2323330515987543Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Rice is a staple food crop and an important model of Poaceae plants in research.It is possible to get nearly all the transcripts of a specific period and specific tissues through RNA-seq,allowing the searching of novel transcripts and novel genes.The aim of this study was to find novel transcripts and novel genes which not exist in Nipponbare,MH63 and ZS97 genomes,which will help to complete the rice annotation information.Following the ‘assembly-then-align' strategy,through de novo assembling of 960 Asian cultivated rice samples from 2014 January to 2016 April,28,352 Oryza sativa indica and 50,648 japonica transcripts were detected.Of these,6,939 indica and 11,875 japonica novel transcripts couldn't align to Nipponbare,MH63 or ZS97 reference genomes.The redundancy between indica and japonica transcripts was evaluated aligning both transcripts to each other.Only 5.1% indica and 3% japonica transcripts were found to have reciprocal coordinate overlap of more than 60%,suggesting that most novel transcripts are speciesspecific.By integrating ORF prediction and Blast search results in non-redundant protein database,we found 3,794 indica and 6,181 japonica protein coding genes from novel transcripts.Function annotation analysis was performed of these novel genes: 3,380(89.1%)indica and 5,314(86%)japonica genes were annotated in Pfam database,1,648(43.4%)indica and 2,494(40.3%)japonica genes were annotated into 23 protein classifications in KOG database,2,489(65.6%)indica and 3,795(61.4%)japonica protein coding genes had Gene Ontology(GO)annotation results,and 715(18.8%)indica and 1,046(17.0%)japonica genes were annotated in Kyoto Encyclopedia of Genes and Genomes(KEGG)analysis.Further analysis of the orthogroups of indica and japonica novel genes revealed that 2,057(54.2%)indica and 4,404(71.3%)japonica genes were species-specific.Furthermore,by using the comprehensive strategy of predicting the coding potential and aligning to known protein sequences,we predicted 16,600 long non-coding RNAs,which bringing foundation for further rice long non-coding RNAs analysis.
Keywords/Search Tags:rice, transcriptome, novel transcript, novel coding genes, lncRNA
PDF Full Text Request
Related items