Font Size: a A A

De Novo Assembly And Annotation Of Watercress(Nasturtium Officinale R.Br.)Genome And Comparative Evolutionary Analysis

Posted on:2020-08-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:C YanFull Text:PDF
GTID:1483306314489494Subject:Vegetable science
Abstract/Summary:PDF Full Text Request
Watercress(Nasturtium officinale R.Br.),native to Europe,is a nutrient intense and leafy vegetable of genus Nasturtium,Brassicaceae family.To date,the genome information of watercress is very limited,which seriously hinders the molecular breeding and utilization of watercress.To obtain the genome information of watercress,we performed de novo whole-genome sequencing of watercress and assembled the nuclear genome,chloroplast genome and mitochondrial genome,respectively.Annotations of the nuclear,chloroplast and mitochondrial genomes presented the comprehensive information of watercress genomes.Comparative evolutionary analyses with the related species were conducted based on genome assembly and annotation.The results are as follows:1.Chromosome number counting in root tip cells at the mitotic metaphase confirms that the chromosome number of watercress used for genome sequencing is 2n=34.FISH with 45S rDNA and 5S rDNA probes were performed on mitotic metaphase chromosomes,respectively.The results suggests that both 45 S rDNA and 5S rDNA probes display two pairs of signals including one pair of strong signals and one pair of light signals.Estimated by the K-mer analysis based on Illumina sequencing data,the genome size of watercress is 345.97 Mb.A total of 48.7 Gb clean data of PacBio single-molecule long reads was generated,corresponding to?141× coverage of the watercress genome.Preliminary assembly and polish based on the PacBio data result in 1,068 contigs with a N50 length of 2.35 Mb.To ensure the accuracy of genome assembly,we conducted contamination deletion for genome assembly and further correction based on Illumina sequencing reads.Using?263-fold-coverage BioNano optical map for assisting genome assembly,we generate the final assembly of the watercress genome totaled 337.51 Mb with a contig N50 length of 3.26 Mb and a scaffold N50 length of 5.85 Mb.BUSCO evaluation for the genome assembly completeness reveals that 97.9%of the complete BUSCOs were detected in the genome assembly of watercress.2.Genome annotation shows that repeat sequences account for 49.15%of the watercress genome,with long terminal repeats(LTRs)being the richest(42.59%).A total of 38,945 protein-coding genes are predicted in the watercress genome,of which 97.2%are functionally annotated.In addition,2,043 rRNAs,558 snRNAs,320 miRNAs and 884 tRNAs were predicted in the watercress genome.Gene family clustering identifed 16,004 gene families containing 32,248 genes and 315 gene families are unique to watercress.Compared with Brassicaceae species,genes of the expansive and specific gene families in the watercress genome are both most enriched in auxin transport.Phylogenetic relationships and divergence time among Brassicaceae species based on single-copy genes suggest that watercress diverged from C.hirsute about 9.2(5.5-14.7)million years ago and the lineage watercress located in diverged from the lineage containing A.thaliana about 16.1(11.7-22.1)million years ago.Relative to the model plant A.thaliana,the watercress genome experienced the recent whole-genome duplication.3.We de novo assembled the complete chloroplast(cp)genome of watercress based on combined PacBio and Illumina sequencing data.The cp genome is 155,106 bp in length,exhibiting a typical quadripartite structure including a pair of inverted repeats(IRA and IRB),a large single copy(LSC)region and a small single copy(SSC)region.The genome contains 113 unique genes,including 79 protein-coding genes,30 tRNAs and 4 rRNAs,with 20 duplicated in the IRs.Compared with the prior cp genome of watercress deposited in GenBank,21 single nucleotide polymorphisms(SNPs)and 27 indels are identified,mainly located in noncoding sequences.A total of 49 repeat structures and 71 simple sequence repeats(SSRs)are detected.Moreover,45 RNA editing sites were predicted in 16 genes,all for C-to-U transitions.The analysis of the Ka/Ks ratios of Cardamineae suggests positive selection exerted on the ycf2 gene in watercress,which might reflect specific adaptations of watercress to its particular living environment.Phylogenetic analyses based on complete cp genomes and common protein-coding genes both show that the genus Nasturtium was a sister to Cardamine in the Cardamineae tribe.4.We sequenced the mitochondrial genome of watercress(Nasturtium officinale R.Br.)using combined PacBio and Illumina sequencing platforms.The assembled genome is 287,019 bp in length,containing 33 protein-coding genes,26 tRNAs and 3 rRNAs.Comparative analyses among Brassicaceae species suggest that all studied Brassicaceae mitochondrial genomes have the large repeat(>1 kb),and showed similar GC content and protein-coding genes organization.But genome size,ORFs number and the total length of plastid-derived sequences vary among some mitochondrial genomes.Composition analyses of repeat with different lengths among Brassicaceae mitochondrial genomes suggested the genomes with the smaller size generally have more small repeats(<100 bp).The transferred sequences from the cp genome to the mitochondrial genome frequently take place in IRs of the cp genome in Brassicaceae species,especially for the long transferred sequences.Genes coding for ATP synthase subunits(e.g.atp6 and atp8)and ribosomal proteins(e.g.rps3 and rps4)in the Brassicaceae mitochondrial genomes appear to have higher nucleotide substitution rates.
Keywords/Search Tags:Watercress, Whole-genome sequencing and assembly, Genome annotation, Chloroplast and mitochondria, Evolutionary analysis
PDF Full Text Request
Related items