BackgroundEscherichia coli(E.coli)is the common bacteria with complicated genome structure and pathogenic mechanism,which can cause a series of diseases,including mild diarrhea,bloody diarrhea,hemolytic uremic syndrome,urinary tract infection,and even death,posing a public health concern.CRISPR is highly polymorphic and widely distributed in bacteria and archaea,which harbors abundant genetic information about the evolution of bacteria.CRISPR can be used as an important typing target to elucidated the evolutionary history and identify highly pathogenic clones.The application of CRISPR typing in Escherichia coli will be of.great importance for monitoring the epidemiological changes of the pathogen and controlling related outbreaks.Objective1 To construct a CRISPR-based typing database for E.coli and compare the discriminatory power of CRISPR typing,MLST and serotyping.2 To establish the lookup table of CRISPR/serotype,screen some spacers specific to certain serotypes,and evaluate the potential of CRISPR typing for predicting serotypes.3 To analyze the genetic diversity of CRISPR,and explore the value of CRISPR typing for clarifying the evolution of E.coli O80.4 To construct a CRISPR-based typing database for E.coli O26,investigate the relationship among CRISPR typing,host source and geographic origin,and evaluate the utility of CRISPR typing in monitoring the global transmission of O26 strains.Methods1 The in_silico_PCR script was used for extracting CRISPR sequences and determining whether the CRISPR site was CRISPR1 or CRISPR2.2.The mutation rules of repeat sequences for E.coli were analyzed by the weblog server.The pattern string was designed to match all analyzed repeats.The module "re" in python was used to recognize the repeat sequence through the pattern string,so as to extract the spacer sequence.All the pipelines were completed by the python software to achieve the batch analysis of spacer identification and CRISPR typing.The visualization of CRISPR was performed by the CRISPRstudio tool.2 The adjusted Rand index,adjusted Wallace index and Simpson index for CRISPR typing,MLST and serotyping were calculated by the R software.3 The EcOH database in the ABRicate pipeline and the web tool SerotypeFinder were applied to obtain the serotypes.The mlst script was used for acquiring the MLST types.The screening of virulence genes was completed by the VFDB database in the ABRicate pipeline.VFanalyzer,the online tool VirulenceFinder and local BLAST.4 The whole genome SNP information for E.coli O80 was obtained by the online tool CSI Phylogeny 1.4.The core genome information for E.coli O26 was acquired by the combination of multiple bioinformatics tools,including PHASTER,ISfinder,BEDTOOLs,snippy and gubbins.5 The phylogenetic tree was constructed by the MEGA and RAxML software.The minimum spanning tree was generated by Phyloviz tool through the goeBURST algorithm.Results1 Construction of CRISPR-based typing database for 39,515 strains of E.coli from the Enterobase databaseIn this study,a total of 39,515 strains of E.coli were collected from the Enterobase database,of which 35,787 strains were publicly available for CRISPR information.Based on the database of 35,787 strains,a total of 6360 and 6067 alleles were found in CRISPR1 and CRISPR2.2.The combination of CRISPR1 and CRISPR2.2 formed a total of 10698 CRISPR types,with a Simpson index of 0.949.2 Application of CRISPR typing for predicting serotypesBased on 35046 strains available for CRISPR typing,serotyping and MLST,the calculation results showed that the adjusted Rand index between CRISPR typing and serotyping was the highest,with a value of 0.858.The adjusted Wallace index of CRISPR typing for predicting serotyping was the highest,with a value of 0.991.The Simpson index for CRISPR typing was also the highest,with a value of 0.947.A lookup table involved in 8923 CRISPR/serotype association was established based on the data set of 35046 strains,and 99.11%of strains in this data set could be predicted to correct serotypes.Meanwhile,with the sensitivity and specificity of serotypes prediction reaching 85%as the cutoff,a total of 71 spacers specific to 26 serotypes were found.Subsequently,the established CRISPR/serotype lookup table was used to predict the serotypes of 268 animal-derived E.coli strains isolated in China.The result showed that 83.96%of 268 strains could be predicted to correct serotypes.Besides,for the dataset,the sensitivity and specificity of specific spacers for serotype predication were 90%and 100%,respectively.Furthermore,the serotypes of 310 human-derived E.coli strains isolated in Denmark were also predicted.The result showed that 89.68%of strains in the dataset could be predicted to correct serotype based on the established CRISPR/serotype lookup table.Additionally,the sensitivity and specificity of specific spacers for serotype predictions were 100%and 98.92%,respectively.3 Application of CRISPR typing in clarifying the evolution of E.coli O80A total of 81 080 strains were collected from the NCBI database.CRISPR typing shows that a total of 21 different CRISPR types were formed by 80 strains of E.coli O80,with a Simpson index of 0.807.The combined analysis of cas1,MLST and CRISPR showed that all O80 strains formed a different evolutionary branch from other prevalent E.coli serotypes.Based on CRISPR spacer profiles,70 O80:H2 strains formed 12 CRISPR types.Among them,CT4212 was the most prevalent,which was found 26 times and distributed in 6 countries.The second dominant CT was CT4211,found 23 times and distributed in 4 countries.According the cluster of CRISPR spacer,70 O80:H2 strains could be grouped into 4 lineages(LⅠ,LⅡ,LⅢ and LⅣ).The divergence analysis of CRISPR spacer showed that lineage LⅠ lacked the spacer at position 2 in CRISPR2.2 compared to lineage LⅡ.Compared with 4 O80:H26 and 1 O80:H19 strains,all O80:H2 strains lacked the spacer at position 5 in CRISPR1.The joint analysis of virulence gene profiles,CRISPR typing and wgSNP typing suggested that lineage LII may have been evolved from LI,and O80:H2 strains may have been evolved from O80:H26 and O80:H19.4 Application of CRISPR typing in monitoring the global epidemic of E.coli O26A total of 1367 strains of E.coli O26:H11 were collected from the Enterobase database.All O26 strains formed a total of 172 CRISPR types,with a Simpson index of 0.861.The cluster analysis based on CRISPR showed that all the O26 strain could be divided into 5 subgroups(Ⅰa,Ⅰb,Ⅱa,Ⅱb,and Ⅱc).The clustering of CRISPR harbored high correspondence with plasmid gene profiles and core-genome typing.Meanwhile,core-genome typing identified a new clone,ST29C4,of which the dominant plasmid gene profile was ehxA+/katP-/espP-/etpD-.Besides,all strains of this clone all belonged to CRISPR subgroup Ⅰa.And 12 of 16 strains in this clone possessed the tsh gene responsible for extraintestinal pathogenicity.The combined analysis of CRISPR typing,virulence gene profiles and core-genome tying showed that ST29C4 was located in the phylogenetic intermediate position between ST29C3 and other lineages(ST29C1-C2 and ST21C1-C2).The frequency analysis of all CRISPR types showed CT6 and CT4 were the dominant CRISPR types in multiple countries.Nevertheless,CT239 was the most prevalent in Australia,CT213 was the second most prevalent in UK,and CT12 was the second most prevalent in New Zealand.In addition,after 2012,CT6 increased rapidly but CT4 decreased.When analyzing the relationship between CRISPR type and geographic region,it was found that 20 CRISPR types were locally epidemic.Besides,CT139,CT246 and CT243 were continuously detected over a long period in a single country.When analyzing the relationship between CRISPR type and host source,it was found that 35 CRISPR types could be found in strains isolated from different countries or different years in a single host.Among them,CT213 and CT218 could be found in 46 and 18 human strains,respectively.Conclusions1 Based on 39515 strains of E.coli from the Enterobase database,the CRISPR-based typing databased is established successfully.CRISPR typing is potentially applicable for predicting the serotype of E.coli.2 CRISPR typing is an important indicator for clarifying the evolution of E.coli O80.3 There is a new clone ST29C4 in O26 strains,belonging to la in terms of CRISPR lineage.CRISPR typing is a valuable tool for monitoring the global transmission of O26 strains. |