Font Size: a A A

The Whole Genome Sequencing And Analyzing Of Ginkgo Biloba

Posted on:2019-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L LiuFull Text:PDF
GTID:1363330590450085Subject:Tree genetics and breeding
Abstract/Summary:PDF Full Text Request
Seed plants are composed of angiosperms and gymnosperms,which diverged from each other around 300 Mya.The success of angiosperms divergence has been attributed inpart to innovations associated with gene or whole-genome duplications,while the relationships among gymnosperms still remain ambiguous that are undergoing a slower evolutionary rate.Cycads,conifers,Ginkgo and gnetales are the four extent gymnosperm groups.Ginkgo biloba is the sole living member of the Ginkgoalean clade,dating back to Jurassic times.The maidenhair tree originated in China and was widely exported internationally by man's appreciation of its beauty and high tolerance of environment contamination and completely resistant to all serious pests and diseases.The whole genome sequencing is the essential platform of studying species divergence,gene functions,regulation of gene expression and so on.The completion of four gymnosperm genome drafts(Picea abies,Pinus taeda,Gnetum montanum,G.biloba)has resulted in improved knowledge of the content,organization and evolutions of the conifer genomes.While the genome of gymnosperm is super large,and these genomes were finished by using the first two generations of sequencing technologies with low quality of genome assembling and none of them could reach chromosome scale.In this study,we constructed the whole genome sequencing of G.biloba combining several newly developed technologies,and the main results of this study are as follows:1.Whole genome sequencing and assembly:(1)The genome of G.biloba was sequenced on the PacBio RSII platform by adopting the single molecule real-time DNA sequencing(SMRT)technology with the single diploid endosperm.The sequencing depth distribution was in a high level which was up to 50 folds with the average sequencing length longer than 6 kb.The total length of genome sequence we assembled was about 9.87 Gb with the length of contig N50 was 1.58 Mb;(2)A high dence genetic map was constructed with 94 individuals from a half-sib family that were sequencing by reduced-representation sequencing(SLAF-seq).The average sequencing depth was 11.2× and 204,361 polymophic SLAF tags were developd.The high dence genetic map contained 12,263 markers along 12 linkage groups.The total genetic length was 1671.77 cM and the average distance was 0.89 cM;(3)Finally,the genome assembly was optimized to scaffolds along the 12 chromosomes with length of scaffold N50 754.5 Mb combining the Hi-C technology and the high quality genetic map for assembling.2.Genome annotation and analysis:(1)The gene annotation was carried out combined with three ways: De novo,homolog and RNA-seq.27,832 genes were predicted with the average gene length was 31,745 bp,the average CDS length was 1,227 bp and the average intron length was 7,748 bp.96% of the predicted genes(25,185)were functional annotated with the protein database.The percentage of repeat sequences of genome was up to 80.24% and the percentage of LTR was 53%.Besides,G.biloba also contained 1,463 tRNA,192 rRNA,98 snoRNA,128 snRNA and 990 miRNA;(2)The genome contained 3,674,092 LTRs and the number of complete ones was 80,854 which distributed uniformly along the chromosomes.The insertion time of LTR was estimated to be 8-10 Mya.There were 7,426 genes had the LTRs including 4,939 complete LTRs,which may have the positive effect on the gene expression;(3)43 MADS-box genes were identified in G.biloba genome of which 2 were type I and the others were all type II.According to the gene expression,GbiMADS005?GbiMADS018?GbiMADS024?GbiMADS026?GbiMADS039 were found to be extreamly high expressed in the early growth;(4)The WGD is estimated by the 4DTv value of paralogs and the two peaks 0.08 and 0.3 predicted that G.biloba have experience duplication eventstwice.The Ks value had two peaks 0.8 and 1.45 calculated by the syntenic paralogs in G.biloba.3.The assessment of genome assembly and annotation:(1)Illunima Hiseq 2500 platform was used for sequencing the haploid leaves and we got 430 Gb rawdata.The G.biloba genome was estimated 9.1 Gb when k-mer was 21;(2)10× Illumina sequencing data with diploid leaves were obtained for analyzing the heterozygosity of G.biloba genome.The average of heterozygosity was 0.24 and the heterozygosity of non-coding region was significantly higher than the coding regions;(3)The transcriptome of seven development stages of leaves,three stages of seed,endersperm,root,long shoot,short shoot,female and male cones of G.biloba were sequenced by Illumina Hiseq2500 for about 5 Gb reads each.The average mapping ratio was 89.29%;(4)The whole length transcriptome of shoot and leaves were sequenced by PacBio RSII platform with insertion of 1-2 kb,2-3 kb,3-6 kb.Finally,the transcript assembly was 558.31 Mb.21,276 of the predicted genes(76.44%)and 78.3% of the 18,777 super introns(> 5 kb)were covered were supported by the long transcripts.4.The analysis of methylation level of G.biloba and the effective population size evolution:(1)WGBS methodwas introduced to reveal the whole genome methylation level in Illumina sequencing platform.As a result,G.biloba was in high methylation level with CpG 88.3%,CHG 81.70% and CHH 3.30%;(2)The other diploid samples from Guangdong,Zhejiang,Hunan were resequenced by Illumina Hiseq 2500 platform,and the effective population size of G.biloba was estimated to be sharply decreased in the near 10 Mya and tended to ba steady.
Keywords/Search Tags:Ginkgo biloba, the third generation sequencing technologies, the whole genome assembly, chromosome anchoring, gene family
PDF Full Text Request
Related items