Font Size: a A A

Transcriptome Profiling Of Whole Organs And Preliminary Study On The Characterizations Of Whole Genome Of The Tea Plant (Camellia Sinensis) By WGS Technology

Posted on:2016-12-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H YangFull Text:PDF
GTID:1363330482482245Subject:Tea
Abstract/Summary:PDF Full Text Request
Tea is one of the most popular three traditional beverages worldwide.The ancestors of cultivated tea plants(Camellia sinensis)are native to Southwest China.At the present,tea plants are cultivated in more than fifty countries,and 2 billions people(1/3 of all)drink tea everyday in the world.Besides important economic value and influence in the world,aboudant secondary metabolits,such as catechins,caffeine,theanine and volatile oils,exist in tea,that not only play a crucial role in tea quality and flavour,but also are the essential material basses for promoting human health.However,C.sinensis possess large genomes and high heterozygosity.In addition,the tea plant is difficult to culture in vitro and to transform,which tremendously hinder the research on the genetic engneering of functional genes.To date,the lack of genomic information and the genome-wide transcription profiling imposes large restrictions on biology and molecular genetics studies,especially the biosynthesis of tea-apecific secondary metabolites and genetic regulation mecahnisms in tea plant.This study was designed to construct the first C.sinensis draft genome and transcriptome profiling of all tissues and a large-scale development of genome-wide SNPs from C.sinensis and its several wild relatives in section Thea of genus Camellia.C.sinensis draft genome,transcriptome dataset and derived SSR and SNP resource can serve as an important public information platform for biological characteristics,origin and evolution,functional genomic studies and molecular breeding in C.sinensis.It will tremendouly promote not only the secondary metabolism research in C.sinensis but also the entire development of tea production.The main results in this study were simpliy described as follows:1.Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds and identified SSRsUsing high-throughput Illumina RNA-seq,the first transcriptome profioling of C.sinensis was constructed in the world.Deep sequencing from poly(A)~+RNA of all tissues of C.sinensis cv.Longjing43 was analyzed at an unprecedented depth using Illumina sequencing platform.Approximate 2.59 gigabase pairs(Gb)of reads were obtained,trimmed,and assembled into 127,094 unigenes,with an average length of 355 bp and an N50 of 506 bp.Comparisons with C.sinensis EST revealed not only the high-reliability but also the high-coverage of this transcriptome dataset.Sequence similarity analyses against seven public databases found 55,088 unigenes that could be annotated and assigned with gene ontology terms or putative metabolic pathways.Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality,such as flavonoid,theanine and caffeine biosynthesis pathways.Novel candidate genes of these secondary pathways were discovered.Thirteen unigenes related to theanine and flavonoid synthesis were validated.Their expression patterns in different organs of the tea plant were analyzed by quantitative real time PCR(qRT-PCR).In addition,12,242 SSRs distributed in unigenes were detected,including all repeat types from mononucleotide to hexanucleotide.Among them,the dinucleotide repeats are the main types,accounting for 63.78%of all the SSRs.The potential of the transcriptomic SSRs for further usage and research was assessed.2.Genome-wide SNPs discovered using RAD sequencing provide high-resolution species boundary and phylogenetic information for Camellia sinensis and its wild relativesUsing high-throughput genome-wide restriction site-associated DNA sequencing(RAD-Seq)technology,the simplified genome sequencing for 18 tea accessions including cultivated accessions from C.sinensis and wild accessions from four wild relatives/varieties were performed on Illumina platform.After data filtering,the effecctive tag sequences were 2.94±0.67 Gb on average.A total of 15,444 bi-allelic SNPs from 18 tea accessions were rapidly and cost-effectively generated after clustering of tag sequences and genotyping of nucleotide loci.Based on the identified genomic SNPs,all accessions were classified into six clusters corresponding to six Camellia species/varieties by phylogenetic,principle component and population structure analyses.It indicated the resultant genomis SNPs were suitable for high-resolution indetification of tested species/varieties and study of genetic relationship.Specifically,novel molecular evidence identified C.taliensis var.bangwei as a transitive tea plant possibly generated from interspecific hybridization of C.taliensis and C.sinensis var.assamica.Cultivated accessions exhibited greater heterozygosity than wild accessions,except for C.taliensis var.bangwei.A total of 1,521genic SNPs were identified from all 15,444 genomic SNP.Among them,1,058 unigenes were annotated with homologous Arabidopsis proteins,and 24 unigenes were identified to be related to secondary metabolic process.3.Whole genome sequencing and characterization of the draft genome of the tea plant (Camellia sinensis)and preliminary evolution analysisBefore the whole genome sequencing of tea plant,the genome survey on the cultivated tea clones of C.sinensis cv.Anhui1(AH1),C.sinensis cv.Tieguanyin(TGY)and C.sinensis cv.Shuchazao(SCZ)and one wild tea plant from C.taliensis(DXS)were carried out using Illumina sequencing technology before the sequencing of tea plant draft genome.There were105.3,110.8,205.1 and 159.8Gb of effective data from AH1,TGY,SCZ and DXS,respectively.Based on 17-mer analysis,the genome sizes of four tea plants ranged from 3to 3.3 Gb.The heterozygosities of them were between 1.5%and 2.2%,with the order of AH1>TGY>SCZ>DXS.The GC contents of four individuals were between 38.50%and39.92%.After preliminary assembly of four genomes,the results showed that the N50lengths of assembled contigs of them were all smaller than 700bp,and the N50 lengths of assembled scaffolds of them were all smaller than 3kb.Prediction of genomic SSRs(gSSR)from the preliminary assemblies of AH1 and TGY retrieved 563,680 and 545,520 results,respectively.Primer batch-designing successfully generated 262,807and 257,564 primer pairs for AH1 and TGY.C.sinensis cv.Shuchazao(SCZ)with the ralative lower heterozygosity in tested cultivated teas was selected for the material for whole genome sequencing of tea plant.The genome sequenging was performed using the whole genome shotgun strategy and high-throughput sequencing technology.The short-insert fragment libraries of 170-800 and the longt-insert fragment libraries of 2-40kb were construced and sequencing on Illumina Hiseq2000 platform for paired-end reads at an ultra depth.A total of 109 sequencing lanes were applied,which produced approximately 1,393Gb of high-quality clean data.In additon,the transcriptome sequencing of 8 tissues of SCZ were also performed,and 94.3Gb of clean data was obtained.After de novo assembly,the initiate contigs of 2.45Gb and the final 92,207 scaffolds of 2.98Gb was generated.The N50 length of assembled contigs and scaffolds were 33.4kb and 347.1kb,respectivly.Sequence comparison with C.sinensis EST from GenBank showed that the ESTs covered about 89.25%of the genomic region.The coverages of 3 randomly selected BACs individually determined through Sanger sequencing with the final assembly were 97.7%,100%and 94.8%,respectively.The tag sequences obtained from RAD-Seq of C.sinensis cv.Shuchazao were also used to assess the quality of assembled genomic regions flanking the restriction enzyme sites.It showed good agreement(96.79%)with the tea plant genome assembly.In the C.sinensis genome,approximately 56.06%of assembly was annotated to be repeats.TEs(transposable elements)accounted for 52.69%of the final assembly(94.0%of all repeats).Among them,the most aboundant TEs were indentified to be LTR(long-terminal repeat element).A total of 48,682 protein-coding genes were predicted,with an average gene length of 4,054bp bp and a mean of 3.3exons per gene.Based on GO,KEGG,SwissProt and InterPro databases,a total of 39,680 protein-coding genes were annotated.In addition,652 tRNA,3,450 rRNA,723 miRNA and 474 snRNA were also identified.In C.sinensis,a total of 33,922 genes were clustered into 17,701 gene families.Furthermore,3,415 unique gene families were identified to be tea-specific.There were7,171 gene families shared by tea plant,kiwifruit,Amborella and Eucalyptus.A phylogenetic tree was reconstructed from C.sinensis genome and other 11 sequenced genomes.It indicated that tea plant was grouped into Asterids,and had the closest relationship with Actinidiaceae.The time for the divergence of tea plant from kiwifriut was estimated at approximately 67.5 million years ago.Based on the tea plant draft genome,transcriptome and BAC library,the whole genomic DNA sequence of LAR that was one of key genes responsible for biosynthesis of catechins was identified and obtained,and the gene structure of LAR was also analyzed.
Keywords/Search Tags:tea plant (Camellia sinensis), draft genome, transcriptome of all organs, secondary metabolism, characterizations of genome, genomic SNP, wild relatives
PDF Full Text Request
Related items