| The plerocercoid larva(sparganum)of Spirometra tapeworms can parasitize in humans causing a parasitic zoonosis known as sparganosis,which mainly manifested as larval migration,leading to blindness,limb paralysis,and even death.Till now,sparganosis has been reported in many countries of the world,such as South Korea,Japan,Thailand,and other East and Southeast Asian countries,as well as Europe and Africa in recent years.The human sparganosis has been reported to be distributed in 26 of 34 provinces/autonomous regions/municipalities in China,with more than 1,300 cases.Studies on the genetic structure of Spirometra tapeworms is not only helpful for understanding the evolutionary history of their populations but also helpful for exploring the mechanisms of their adaptation to the parasitic environment,which in turn can guide the development of effective control strategies for sparganosis.However,despite their significance in medical tapeworms,our knowledge about the population genetic background of Spirometra tapeworms remains fragmentary.In this study,based on our previous accumulation,we used a large sample(a total of 368 samples collected from 76 regions in 16 provinces/autonomous regions/municipalities),big data(data from both traditional multi-molecular markers and microsatellite data using highthroughput sequencing platforms),and multiple analysis methods to systematically analyze the genetic diversity of the Spirometra population in China.Specifically,there are two parts in this study.In the first part,we performed the genetic diversity,population structure,population genetic differentiation,and phylogenetic patterns of Spirometra tapeworms in China by using four traditional molecular markers(cytb,coxl,12S and 28S rDNAD1).In the second part,we firstly constructed a novel Target SSRseq genotype analysis method using screened perfect SSRs including stable motifs and flanking sequences based on the omic data in combination with the Illumina NovaSeq high-throughput sequencing platform.Secondly,we conducted a series of genetic polymorphism analysis of the Spirometra tapeworms based on our newly constructed Target SSR-seq method.In the end,a set of core SSR combinations and core sample sets were successfully screened.The core SSRs can be used to reflect the genetic diversity of Spirometra tapeworms to the greatest extent with the least number of SSR loci,while the core sample sets used to widely represent the genetic background of Spirometra tapeworms in China.The core core sample sets significantly reduced the cost of sample collection and preservation,and provided convenience for genetic diversity analysis and genotype identification of Spirometra tapeworms.The achievements of this study will be helpful for deeply understanding the genetic variation of Spirometra tapeworms and laid a foundation for the prevention and control of sparganosis.Materials and Methods1.Samples collection and identificationFrom July 2013 to September 2020,our group investigated the natural infection of pleroceroid in 23 of the 26 endemic areas of China(excluding Taiwan,Hong Kong and Macao),and no positive hosts infected with pleroceroid were found in 7 of the endemic areas.Therefore,we finally obtained 368 samples from 12 different hosts at 76 sampling sites in 16 endemic areas.The collected geographical strains were identified molecularly based on cox1 molecular markers.In addition,the morphological characteristics of the eggs and different segments of adult of Spirometra tapeworm were observed using light and electron microscopy.2.Acquisition of target genes and multiple sequence analysisSample DNA was extracted using EasyPure? Genomic DNA Kit,and the target genes were amplified based on four molecular markers(cytb,cox1,12S,and 28S)using the sample DNA as a template.The purified PCR products were sent to the Company for sequencing.The sequencing sequence was uploaded to GenBank database.The sequencing results of four genes(cytb,cox1,12S and 28S)were first manually proofread using Chromas to remove low-quality sequences.The sequences were then spliced using the Cap3 online tool.Blast comparison was performed in the NCBI database to verify the homology of the sequences.MEGA X software was used to calculate the nucleic acid sequence components of the four genes and perform multisequence alignment.Nucleic acid substitution saturation tests were performed on the molecular sequences using DAMBE v 7.0 software.3.Screening of perfect SSR markersMicrosatellite search tool MISA was used to search for SSR loci in the following data:(1)the research of the existing transcriptome sequences(2)through the NCBI EST database search keywords sparganum and spirometra.Perfect SSR sequences with 3-nucleotide and/or 4-nucleotide patterns and flanking sequence of loci meeting primer design requirements were selected as the target sequence.Primer design using Primer3 Plus,and to evaluate and optimize the PCR primers.4.Library construction and high-throughput sequencingThe optimized panels were used for multiple PCR amplification using genomic DNA of each sample as a template.Primers with Index sequence were used to add 8bp specific label sequences compatible with Illumina platform to the end of the library by PCR amplification.After that,the amplified products of all the samples were mixed in equal quantities,and the final FastTargetTM sequencing library was recovered by gumming.The length distribution of fragments in the library was verified by Agilent 2100 BioAnalyzer.After the molar concentration of the library was accurately quantized,the Illumina NovaSeq platform was finally used for high-throughput sequencing in a 2×150bp double-ended sequencing mode to obtain sequencing data.5.Genetic diversity analysisFor traditional molecular marker data:Software DnaSP6 was used for haplotype analysis based on single gene and combined sequences,and genetic polymorphism of each geographical population was calculated;Arlequin V.3.5 software was used to calculate Fst values and assess the level of genetic differences between different geographical populations;haplotype network maps were constructed using the mean neighbor joining algorithm of Network in PopART software.For SSR-seq data:raw reads were filtered to get clean reads with FastQC;a Perl script "SSRSeq count"str_count.pl was used to process the clean reads to generate an SSR read count table;a Perl script str_type.pl and an online SSR genotyping tool-SSRSeq V1.1 were used to output SSR genotypes.Genetic information statistics including observed number of alleles(Na),effective number of alleles(Ne),Shannon’s information index(I),observed heterozygosity(Ho),and expected heterozygosity(He),were calculated by using the software POPGENE32.Polymorphic information content(PIC)was calculated by using a Perl script.6.Genetic STRUCTURE analysisFor traditional molecular marker data:STRUCTURE V2.3 software was used for STRUCTURE analysis;mismatch distribution analysis and Neutrality test were performed using Arlequin V.3.5 software to check the expansion dynamics of the population.For SSR-seq data:the software STRUCTURE V2.3 was also used to infer the population structure;the R package polysat was used to calculate the genetic distance between pairs of samples and principal coordinate analysis(PCoA).7.Phylogenetic analysisPhylogenetic trees were constructed using Neighbor-joining(NJ),maximum parsimony(MP),maximum likelihood(ML),and Bayesian inference(BI)based on haplotype datasets from combined sequence.The neighbor-joining method was performed in MEGA X,the maximum parsimony method was performed in PAUP software,the maximum likelihood analysis was performed in PhyML software,and the Bayesian analysis was performed using MrBayes software.Neighbor-joining(NJ)and ggtree in phangorn in R packet were respectively applied to construct and draw the evolutionary tree based on SSR-seq data.8.Analysis of population genetic differencesThe software Treemix was used to infer multiple population splitting and mixing events based on genome-wide allele frequency data.The results were based on the provided materials to draw the maximum likelihood tree of the population and to calibrate migration events in the phylogenetic tree.The genetic differentiation coefficient and gene flow between subpopulations were calculated by the software GenAlEx using allele frequency and genetic distance respectively.9.Screening of core SSR set and core sample setA core set of SSRs was screened out based on Perl script,that is,the minimum number of SSRs could represent the genetic diversity of Chinese isolates to the maximum extent,and the saturation curve was plotted according to their identification ability.Core Hunter software was used to screen core samples of all isolates of Spirometra tapeworms in China.Results1.Analysis of target genesAll isolates were identified as S.erinaceieuropaei based on cox1 genotyping and morphological identification.Four target genes(cytb,cox1,12S,and 28S)were successfully amplified,with lengths of 1110 bp,1566 bp,280 bp and 299 bp,respectively.These sequences were not saturated,indicating that they are suitable for phylogenetic studies.The A+T content of three mitochondrial genes was higher than that of G+C,which was consistent with the base composition characteristics of mitochondrial genes,while the A+T content of the only nuclear gene 28S(46.04%)was lower than that of G+C(53.96%).Among the four genes,cox1 had the highest genetic diversity,followed by cytb,12S and 28S.2.Genetic diversity based on combined genesA total of 256 haplotypes were identified from 368 isolates from 16 geographic populations.The haplotype diversity index(Hd)was 0.997±0.001,and the mean nucleotide difference number(K)was 52.88969.The nucleotide diversity of most populations was more than 0.1,indicating high genetic variation.The Fst values among most geographic populations were greater than 0.15,and generally an Fst greater than 0.15 indicated a high degree of genetic differentiation.By constructing the haplotype network map,it was found that the haplotypes of each geographical population were mostly concentrated,showing a distribution pattern consistent with the geographical distance.while a few haplotypes were scattered throughout the network,indicating less genetic exchange and greater differentiation among geographic populations.Each geographical population was composed of multiple haplotypes,with few shared haplotypes,reflecting the high level of genetic polymorphism in different populations.In addition,combining the results of neutral detection and mismatch analysis,it was inferred that the population of S.erinaceieuropaei in China has not yet reached the bottleneck stage and has not experienced population expansion historically.3.Genetic structureSTRUCTURE analysis based on the combined sequences showed that samples from China could be divided into two groups:Group 1 and Group 2,with Group 1 containing 164 haplotypes and Group 2 containing 92 haplotypes.There were two genotypes in samples:genotype 1 and genotype 2.And further analysis based on geographic population grouping revealed that almost every geographic population had two genotypes,but there were also some geographic populations with isolates of S.erinaceieuropaei showing a single genotype.4.Phylogenetic patternBased on the haplotype data set of combined sequence,the phylogenetic analysis of S.erinaceieuropaei isolates from different regions of China was performed using NJ,ML,BI and MP methods.Phylogenetic analysis results showed that the topological structure of the trees obtained by the four algorithms was consistent,which supported that the population of S.erinaceieuropaei in China could be divided into two main groups:Group 1 and Group2.Group 1 mainly comes from East and central China.Group2 mainly comes from South China and southwest China.5.Development of SSR molecular markersA total of 4,280 SSRs were identified by MISA analysis.Of these 4,280 SSRs,1,024(23.93%)were di-nucleotide repeats,2,720(63.55%)were tri-nucleotide repeats and 101(2.36%)were tetra-nucleotide repeats.The distribution of SSR motifs revealed the CT repeat was the most widespread di-nucleotide SSR motifs(329/1,024).The TCT repeat was the most widespread trimers among all tri-nucleotide SSR motifs(217/2,760).Based on all these identified SSRs,we finally obtained 98 SSRs after SSR filtering and perfect SSR search.After primer detection and optimization of library enrichment system,39 evenly distributed perfect SSRs were finally selected to test in Target SSR-seq.During the high-throughput sequencing of 368 S.erinaceieuropaei isolates,6 samples were failed due to the low quality of specimens.The average sequencing depth per SSR capture in 309 samples(85.4%)was more than 1000×.The clean ratio was above 0.8 for almost all of the samples(97%),indicating high quality and accuracy of sequencing results.Moreover,4 of the 39 SSRs exhibited high miss rates(>20%),probably due to unstable flanking sequences.One SSR was observed as monomorphisms in 362 varieties.Finally,a total of 34 polymorphic SSRs were obtained for genotyping 362 S.erinaceieuropaei varieties from China.6.Genetic diversity of SSR lociTarget SSR-seq captured 170 alleles of 34 target SSR loci in 362 samples.The allele number per SSR locus varied from 2 to 17 with an average of 5,and the average of effective number of alleles(Ne)was 2.4763.The Shannon’s information index(I)varied from 0.1064 to 2.5105 with an average of 0.9455.The observed heterozygosity Ho ranged from 0.0417 to 0.9611 with a mean of 0.527,and most of SSRs(25)exhibited higher Ho(>0.4).Furthermore,the genetic diversity estimated by expected heterozygosity(He)varied from 0.0409 to 0.9064(mean=0.5004),while the PIC value ranged from 0.0401 to 0.8971(mean=0.4508).Overall,the 34 target SSR loci showed various alleles and high polymorphism rates,which were proven to be suitable for varieties identification of S.erinaceieuropaei isolates.7.Genetic diversity between populations and samplesWe firstly explored the genetic differences among 16 different geographical populations using the R package of Phangorn.The NJ tree based on Nei’s unbiased genetic distance demonstrated that the 16 geographical populations were clustered into three genetic groups.The first group mainly included populations from central(HB and HeN)and eastern(AH,JS,JX and SH)China,with the exception of CQ and GZ populations,which from southwestern China.The second group contained 3 southwestern populations(GX,YN and SC),3 southern populations(GD,HN and HuN)and 1 population from eastern China(ZJ).The third group only consisted a population from southeastern China(FJ).The principal coordinate analysis(PCoA)showed a similar clustering pattern with that in the phylogenetic inference.Within the PCoA plot,the first principal coordinate accounted for 52.77%of the total variations,while the second principal coordinate accounted for 26.57%variations.Next,the genetic differences of tapeworms isolated from different hosts were explored.In detail,isolates from 8 intermediate hosts and 4 definitive hosts were investigated.The clustering analysis showed that the samples from different hosts were clustered into three genetic groups.The PCoA analysis showed 54.21%and 30.5%explained variance of principal component analysis PCoAl and PCoA2,respectively.The locations of the definitive hosts(cats and tigers)were scattered and far from each other,while the intermediate hosts(frogs and snakes)were concentrated in the middle of the coordinate axis and close to each other,indicating greater genetic varieties among the definitive hosts and less genetic varieties among intermediate hosts.Moreover,in order to investigate the detail genetic diversity of each isolate,we performed a phylogenetic analysis using all 362 samples.The unrooted NJ tree revealed three main clusters:clade 1,clade 2 and clade 3.8.Genetic structure analysisThe STRUCTURE result indicated that 362 S.erinaceieuropaei varieties were divided into two main populations(Pop1 and Pop2).In general,245 samples(67.7%)were assigned to Popl and the remaining 117 samples(32.3%)were assigned to Pop2.The STRUCTURE analysis did not divide the 16 populations according to their geographical origin,in contrast,each population showed a mixed membership to the inferred clusters.Isolates from AH(17),CQ(17),HB(17),JS(17)and SH(3),revealed one prominent genotype,whereas the remaining isolates indicated two main genotypes.Moreover,the PCoA analysis of 362 samples showed a similar clustering pattern with the STRUCTURE analysis.Within the PCoA plot,the first principal coordinate of the PCoA analysis explained 16.53%of the variation,while the second principal coordinate accounted for 11.58%of the variation.All isolates generally clustered into two main groups(popl and pop2)with partial overlap among several samples.9.Population difference analysisA large number of migration events were identified among these S.erinaceieuropaei populations.One migration was detected began from GZ population to SC population with strong migratory signal.In addition,four migrations with weak signals were detected,including one event that began from SC to JX,one that began from SC to HB,one that began from GZ to YN and one that began from GZ to JS.The Fis value varied from-0.6457 to 0.2473 with an average value of-0.1577.Fit values varied from-0.5968 to 0.5176,with an average of-0.0481,indicating that the population has a low level of inbreeding at present.The average genetic differentiation index(Fst)of S.erinaceieuropaei populations in China was 0.0947,indicating that there was a moderate degree of genetic differentiation among populations.The mean value of gene flow Nm was 2.3899,indicating that there was some gene exchange in the population of S.erinaceieuropaei in China.10.Selection of core SSR set and core sample setIn this study,23 core SSRs were selected as core SSRs set by Perl script,and the identification ability of 23 core SSRs reached 91.1%.To further explore the utility of 23 core SSRs in genetic diversity analysis,we first performed genetic diversity analysis:the number of alleles Na per locus ranged from 3 to 17,and the heterozygosity Ho was observed to vary from 0.2 to 0.96.19 SSRs exhibited high Ho.The expected heterozygosity(He)varied from 0.31 to 0.91,and the polymorphism information content(PIC)varied from 0.29 to 0.9.The PIC values of 14 SSR loci were higher than 0.5,indicating that 61%of SSR loci were highly polymorphic.Therefore,the genetic diversity of core SSR locus is high,which is suitable for genetic structure analysis and germplasm identification.Secondly,the genetic structure analysis based on 23 SSRs also divided 362 isolates into two subpopulations,which was consistent with the analysis results of 34 SSRs.PCoA analysis also significantly differentiated the two subpopulations,with PCoAl accounting for 15.37%and PCoA2 for 13.39%of the total genetic variation.In conclusion,these 23 core SSR loci can fully reflect the genetic diversity of S.erinaceieuropaei population in China,and can be used for identification of S.erinaceieuropaei population in China.In addition,the core hunter software was used to set different sampling ratios of 0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.2 and 0.3 to screen core samples in each population.Finally,76 strains were selected as the core samples of Chinese S.erinaceieuropaei isolates,with the sampling ratio of 0.2 and MR of 0.43.Generally speaking,the larger MR value is,the more representative the selected samples are.Conclusion1.High haplotype diversity and many unique haplotypes were detected in Spirometra tapeworms of China.2.Although conservative evolution in each geographical population,moderate genetic differentiation among different populations was revealed.3.Two genotypes(genotype 1 and genotype 2)were confirmed,both genotype 1 and genotype 2 were crosswise distribution in most of geographic populations with genotype 1 as a prevalent genotype.4.Structure analysis verified Group 1 and Group2 two main groups in Chinese Spirometra tapeworms,and Group 1 mainly contained isolates from east and central China,while Group2 contained isolates from south and southwest China.5.A novel genotyping method suitable for genetic diversity study of Spirometra tapeworms named Target SSR-seq was constructed,and a core SSR set and core sample set were successfully screened,laying the foundation for further population genetics studies. |