Font Size: a A A

Population Structure And Genetic Diversity Of Escherichia Coli Isolates

Posted on:2017-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y R WuFull Text:PDF
GTID:2270330488455859Subject:Military Preventive Medicine
Abstract/Summary:PDF Full Text Request
Background One of the major habitats of Escherichia coli(E. coli) is the intestinal tract of warm-blooded animals, and it can be expelled into surrounding environment following fecal deposition. Recent years, foodborne disease outbreaks, such as 2006 North American E. coli O157:H7 incident and 2011 Germany E. coli O104:H4 outbreak, have been reported to be associated with the ingestion of vegetables contaminated by E. coli. This suggestes that plants could be another reservoirs for E. coli. Taking raw vegetables as food is considered to be nutritious and healthy. However, it should be aware of the potential dangerous to human health, if these vegetables are contaminated with pathogenic E. coli. Nowadays, the population diversity of plant-associated E. coli and the mechanism for its persistence within plant hosts keeps unknown. The rapid development of whole-genome sequencing technology provides opportunities for getting more knowledge on these issues. In this study we sequenced two collections of E. coli strains with different origins using Illumina Hi Seq 2000. One was collected from agricultural field-grown vegetables, referred as the ‘GMB’ collection, and the other from mammalian animals, called the ‘ECOR’ collection. Whole genome-wide population genetics analysis was performed for GMB collection, with the ECOR collection as control. This study aimed to get insight of following questions.(i) The phylogenetic distribution of plantassociated E. coli isolates.(ii) The differences between the GMB and ECOR collection from genetic diversity and evolutionary dynamics perspectives.(iii) The factors influencing E. coli persistence on plants.Results and discussion Genome-wide SNP identification and population diversity analysis After being double-checked with the assembly contigs and sequencing reads, 354,888 reliable genome-wide SNPs were identified among 256 strains, including 66 strains of the sequenced ECOR collection, 105 strains of the sequenced GMB collection, and 64 published complete maps of E. coli and Shigella from NCBI. Basing on these SNPs, we constructed a phylogenetic tree using neighbor-joining method. Phylogenetic analysis indicated that the GMB isolates revealed high polymorphism that covered most of the known E. coli phylogroups, which consistent with previously MLST analysis. According to the calculation of nucleotide diversity(π), we found that the genetic diversity of plant isolates is even higher than the ECOR collection. The distribution of GMB strains in each phylogroups is uneven, with a tendency of clustering within the B1 phylogroup(41/105, 39.05%). Additionally, two clades of the GMB collection revealed unusual branch length. According to the value of average nucleotide identity(ANI), these two distinct clades were classified as cryptic Escherichia clade C-I and C-V. As the GMB strains were isolated from different locations during two years, it would be possible to infer their spatial-temporal distribution in natural environment. Firstly, we observed the population composition for each isolation location in 2008 and 2009, respectively, and found that the composition of E. coli population significantly varied in different years(Fisher exact test, P=0.0362). Furthermore, according to association analysis among genetic distances of each pair of strains, isolation dates and locations, the GMB strains isolated in a same location but different years revealed high heterogeneity, suggesting the plant-colonized E. coli population couldn’t successfully survived to the next year. There are only several pairs of strains that isolated from different locations revealed close genetic distance, suggesting limited trans-location spread of the plant-colonized E. coli in the same year. Evolutionary dynamics of the ECOR and GMB collection Tajima’s D is a statistical test for inferring evolutionary history at the DNA level, and it can be used to infer the occurrence of genetic drift, directional selection, demographic expansion and contraction, genetic hitchhiking, et al. This study used GD software to compute the D value for 66 ECOR strains and 105 GMB isolates, and got 0.448242 and-0.151549, respectively. Both two D values were approach zero, suggesting the ECOR and GMB collection are evolved neutrally at the population level. In order to investigate potential selective pressure influencing E. coli persistence in different habitats(intestinal tract or plants), we used Ka Ks Calculator software to calculate the Ka/Ks ratio of each gene separately for the ECOR and GMB collection. The results indicated the Ka/Ks ratio distribution between the ECOR and GMB collection is largely consistent, but there are still a large number of outliers being found. Through multiple statistical methods, we identified 17 genes from these outliers with positive selection signals. Five genes are related with the ECOR collection, named ybd J, hsr A, hyp A, csg C and maa, which associated with colonization in the host intestine and drug resistance. And the other twelve loci are related with the GMB collection, among which, bss S gene is associated with biofilm regulation, dkg B, asr, blr and yib D gene are closely related to surviving pressure such as osmotic shock, acid shock and starvation. The results indicate that although E. coli are evolving neutrally at population level, positive selection are presented at gene level, helping different population of E. coli to adapt to their niches. Identification of genetic elements influencing E. coli persistence on plants Gene gain and loss is an important strategy for bacteria adaptation to environmental changes. Based on BLAST and SOAP-aligner sequence alignment softwares, this study analyzed the accessory genomes for our sequencing samples, to identify the genomic fragments obtained by GMB strains that potentially beneficial for their survival on plants. Combining the results of phylogenetic analysis, we found that the gain and loss of accessory genome is closely related to phylogroups, but there are no GMB-specific genome fragment being identified. Due to GMB strains tend to grouped in the phylogroup B1, we focused on analyzing the gene gain and loss of 41 GMB strains and 19 ECOR strains in B1 phylogroup. Eventually, we identified 45 large fragments with a total length of 105.73 kb that are present in at least 5 GMB strains but absent in all 19 ECOR strains. When annotating these fragments, we found that they include four important functions, which involved in DNA damage repair, degradation of plant cell wall, phage tail assembly and biosynthesis of arginine. As Ara C family proteins could be involved in plant cell wall degradation and bacteria-plant interaction, it may help E. coli at the surface of leaves to reach the inner parts of leaves and be able to persist and grown on plants. Through the heat map of accessory genome gain and loss, we found that three GMB strains located in different clades of the phylogenetic tree, shared unusual number of accessory genome fragments. Associating with their spatial-temporal distribution, we propose a hypothesis that some GMB strains, although could not leave the direct descendant to the next year in the same field, they could transport genomic legacy to the replaced population through currently unknown routes(such as soil or phage).Conclusion To our knowledge, this study provides the first genome-wide characterization of population structure and genetic diversity analysis of plant-associated E. coli, making foundation for investigating the possible survival and transmission mechanisms of E. coli in different reservoirs. The results of evolutionary dynamics analysis showed there are different positive selection gene loci linked with the ECOR and GMB population respectively, providing a possible explanation for how different selective pressures shaped E. coli population from different habitats. Through gene gain/loss analysis, we identified accessory genome fragments enriched in the GMB strains, providing preliminary information for the mechanisms of E. coli to persist on plants. Furthermore, we observed that the GMB strains are difficult to survived continuous for years in the same local area, but they could leave their genome fragments in soil et al., which can be acquired by new replaced population. This provides chances for virulence and drug-resistance gene transfer. Therefore, to strengthen agricultural soil environment monitoring, and to find and cut off potential routes of transmission, would be helpful for prevention and control for the outbreak of plantassociatedpathogenic E. coli.
Keywords/Search Tags:Escherichia coli, plants, whole-genome sequencing, population structure, population genomics
PDF Full Text Request
Related items