| Cucumber(Cucumis sativus L.),an important vegetable crop grown worldwide,is the model system for plant vascular biology and sex determination studies.Population genomic data permitted previous studies to systematically characterized genetic variations including SNPs and InDels within cucumber species.However,these studies also found that only a single reference would impede the comprehensive discovery of structural variations(SV)and functional genes.The prevalence and dropping cost of the third-generation sequencing technology allow us to perform genome-wide deep sequencing to analyze genetic variations within species and reveal the content of biodiversity that cannot be simply uncovered by a mere one reference genome.In this study,we carried out de novo assembly and annotation for 12 representative cucumber accessions using PacBio long-read sequencing technology;characterize genetic variations among 4 cucumber groups;provided new insights into genome evolution within cucumber species,and ultimately constructed the cucumber pan-genome,revealing the content of genetic information in terms of nucleotide sequences and protein-coding genes at a species-wide level.The main results are listed below:1.Leverage of PacBio sequencing technology generated an improved reference genome for ―9930‖ cucumber with remarkable increases of contiguity and completeness: up to 8.9 Mb of Contig N50,which was 234-fold increase relative to the previous version;an extra assembled 29.0 Mb genomic sequences,most of which were repetitive elements;significantly improved quality of assembled LTR-RTs.Chromosome-level reference genome assemblies and genome annotations for 11 cucumber lines which represented the 4 cucumber groups across the world were also generated by PacBio sequencing: genome sizes ranged from 234.6–251.1 Mb with Contig N50 sizes between 1.7 and 5.3 Mb,and a total of 24,583–26,033 protein-coding genes were annotated,which provided vital resources for cucumber research community and plant comparative genomics.2.Corresponding relationship of coding genes among these 12 cucumbers was determined.A more complete genetic variant dataset and its impact to functional genes were also constructed.A total of 2.9 million SNPs,1.4 million InDels and 56,214 SV(including insertions,deletions,inversions and translocations)were identified.SVs in the case of large-segment replacement were also resolved.Totaling 2,626 SVs impacted CDS of coding genes,and 2,598 SVs in total might be associated with cucumber domestication.Several previously reported SVs that might control agronomically important traits could be detected in the variation dataset in this study,and we also generated some possibly candidate variants which were worth further validation and analyze.3.Genome repeat annotation indicated the diverse content of LTR-RTs among the 12-accession collection in terms of super family(Copia,Gypsy and Unknown),and the number of family members was also of high diversity.Several lineage-specific LTR-RT families,as well as those that underwent copy number gain or loss in particular lines were identified,which might exert functions in the genome.Utilizing Hi-C data,we compared the chromatin conformation of wild and cultivated cucumbers and detected six previously reported chromosome-level rearrangements and one newly identified large-scale inversion,and subsequently determined the presence/absence nature in 12 cucumber lines of these large SVs.Accompanied by the species-wide phylogenetic relationship built with single-copy orthologous genes and genomic information of melon,we speculated a stepwise occurrence process of these large-scale chromosome rearrangements during cucumber domestication.4.A total of 809 genes present in the 11 newly assembled cucumber lines but absent in the 9930 reference genome were identified,in which those harboring protein domains of chromatin organization and gene regulation were significantly enriched.These PAV genes and the pan-genome comprising 26,822 non-redundant coding genes will facilitate genome-guided breeding in cucumber.Overall,in this study we built 12 high-quality,chromosome-level and representative cucumber assemblies accompanied by the cucumber pan-genome,providing vital resources for plant comparative genomic studies.The characterization of SNPs,InDels and SVs will generate candidate targets for cucumber biology,genetic studies and molecular breeding.We also deepened the understanding of genome evolution within cucumber species,laying a foundation for further studies focusing on phylogenetics of species from Cucumis genus and even Cucurbitaceae family.The cucumber pan-genome will serve as an important complement for the single reference genome. |