| Isoflavones are phenolic secondary metabolites found mostly in legumes. Epidemiological studies comparing populations in Asia, where soy consumption is high, compared to Western countries, suggests soybean food may contribute to multiple health benefits. Additional research has demonstrated that isoflavones in soybean are beneficial for decreasing certain cancers, osteoporosis, cardiovascular disease, and menopausal symptoms in animal models and some human trials. The legume specific isoflavones also play key roles in many plant-microbe interactions. Isoflavoens are phytoalexins and have antimicrobial effects. They also can serve as molecule signals that are essential for the establishment of symbiosis between soybean and soil rhizobia. Because of their biological activities, metabolic engineering of isoflavonoid biosynthesis in legume and non-legume crops have significant agronomic and nutritional impact by enhancing plant disease resistance and providing dietary isoflavones for the improvement of human health.In soybean seeds, the three types of isoflavones, daidzein (Dai), genistein (Gen), and glycitein (Gly) are predominately occurred as glucosides or malonyl-glucosides. Total isoflavones (Total) fluctuate by cropping years and planting locations, indicating a large environmental effect. Other studies have shown large variations in isoflavone concentrations and compositions among soybean genotypes as well. Together, both genetic and environmental factors make breeding isoflavone levels very difficult in soybean. Several quantitative trait loci (QTL) for individual and total isoflavone concentrations in soybean seeds have recently been discovered, demonstrating the complexity of isoflavone traits. However, it is not clear whether or how these isoflavone variations are affected by some critical structural enzymes.Previous studies have revealed that key biosynthetic enzymes such as isoflavone synthase (IFS) and flavanone 3β-hydroxylase (F3H) genes are important and play opposite roles in isoflavone engineering. IFS and F3H share common substrates for the formation of isoflavones and flavonoids, respectively in the phenylpropanoid pathway. Since sequence variations are heretically stable and can have a major impact on how the organism develops and responds to the environment, these variations in the genes of the key isoflavone synthesis enzymes could have major impact on seed isoflavone levels and soybean stress tolerance.Single nucleotide polymorphisms (SNPs), which include single base pair changes and small insertions/deletions (Indels), can serve as molecular genetic markers. SNPs are abundant and relatively stable in the genome, and have been discovered within genes underlying observed traits. Association analysis, also known as association mapping, is a population-based survey used to identify trait-marker relationships based on linkage disequilibrium (LD). Association analysis has the potential to identify a single polymorphism within a gene that is responsible for the difference in phenotype. To identify the causative SNPs in IFS1, IFS2 and F3H gene that are associated with the soybean seed isoflavone levels and stress tolerance, we carried out a set of association analyses in this study.Before association analysis, we investigated the level and pattern of diversity along the IFS and F3H genes. Allele sequences of IFS1, IFS2 and F3H gene were obtained from 33 soybean accessions, including 17 Glycine max and 16 Glycine Soja. These accessions were collected from latitude 19-49°N and longitude 106-131°E to sample not only all the six ecological regions of soybean in China but also the soybeans with diverse seed isoflavone levels.We analyzed the genetic diversity of these 33 accessions by genotyping them using 55 unlinked SSRs which provided an even coverage of the soybean genome. We found that the genetic diversity index (H) of these 33 accessions was 0.87, higher than that found in 589 soybean accessions genotyped by the same 55 SSRs (H=0.82). In addition, our total allele number was 59% of that of the 589 accessions. Since our accession number was only 5.6% of the 589 accessions, we think that the 33 accession we selected had sufficient genetic diversity for nucleotide diversity analysis and preliminary association analysis.By using theses 33 accessions, we analyzed the nucleotide diversity, extend of LD and carried out multiple neutral tests for IFS1, IFS2 and F3H gene. Compared to former reports on average soybean genomic nucleotide diversity, the nucleotide diversity (π) at IFS1, IFS2 and F3H gene in this study was almost 10-fold higher (from 0.00170 to 0.00852 in coding regions and from 0.00487 to 0.00856 in noncoding regions). This dramatic difference could be explained by the selection of G soja population and the wide geographic diversity of the accessions in this study. In addition, purifying selection was found to act on the two IFS genes, while the F3H gene did not show departure from the neutrality assumption, indicating that these two kinds of genes have experienced different selective pressures during evolution. Distinct forms of selection produce specific patterns of sequence diversity. In the neutral theory of molecular evolution proposed by Kimura, purifying selection was assumed to be ubiquitous and can remove deleterious mutations from a population, as a result, much higher nucleotide diversity (π) were found in the F3H gene (from 0.00815 to 0.00841 in coding regions and from 0.00503 to 0.00748 in non-coding regions) than that in the two IFS genes (from 0.00159 to 0.00232 in coding regions and from 0.00362 to 0.00856 in non-coding regions). The legume specific IFS genes appeared to be more conserved than the F3H gene during evolution.Consistent with the high level of nucleotide diversity of the three genes, the extent of LD of these genes of the combined populations was quite short, less than 1,000 bp. The combined populations we examined were a wide geographic sample of germplasms and would have a long time for genetic associations to decay. The power to detect associations between an SNP and quantitative traits largely depends on having sufficient density of SNP markers to ensure that some SNPs will be in LD with the molecular variant that contributes to phenotypic variation. Therefore, the three genes in our study showed sufficient density of SNPs and genetic resolution to serve as the candidate gene for the association analysis.Prior to association analysis, population structure was estimated by the Structure software based on the 55 unlinked SSR markers. Two subpopulations, in agreement with G max and G soja, were confirmed as the most likely subdivision of our plant materials. This population structure estimates were used in the TASSEL software to test for associations between IFS1, IFS2, and F3H polymorphisms and mean seed isoflavone levels and the stress tolerance traits:disease rate (DR) for soybean mosaic virus Sc-3 and Sc-7, relative root elongation (RRE) for aluminum toxicity tolerance, and, relative shoot dry weight (RSDW) for phosphorus deficiency tolerance, respectively. All polymorphisms, including singletons, were considered in the association analysis. Significant sites (P<0.05) were identified by both general linear model (GLM) analysis (not considering population structure) and logistic regression analysis (considering population structure) in the TASSEL software.For each of the three genes, there were several SNPs closely associated (P<0.05) with the traits separately, indicating that IFS1, IFS2 and F3H polymorphisms were associated with seed isoflavone concentrations and soybean stress tolerance. To decrease possible false positive and negative results caused by population structure, only those identified by both analysis methods and significantly associated (P<0.05) to all four traits were selected. These significant sites were summarized as follows:1. For soybean seed isoflavone concentrationsFor the IFS1 gene, out of the 116 SNPs, two in 5’UTR (157 A/G and 696 A/G) and one in the first exon (1143 T/C) which caused a serine to proline change, were significantly associated with all four traits. For the IFS2 gene, out of the 104 SNPs, one in the first exon (1508 C/T, synonymous) and one in the second exon (2353 G/A), which caused a valine to methionine change, were significantly associated with all four traits. For the F3H gene, out of the 91 SNPs, one in the first exon (268 A/G, synonymous) and one in the second exon (1310 A/G), which caused a threonine to alanine change, one in the third exon (2198 C/T), which caused an alanine to valine change, and two SNPs in the second intron of F3H (1488 T/C and 2198 T/C) were significantly associated with all four traits.2. For soybean mosaic virus resistanceFor the IFS1 gene,14 SNPs, including a 9 bp Indel, were significantly associated with SMV Sc-3 resistance. Twelve of them were in high LD and can be considered as a SNP hyplotype:’TCACAACGAOTACA’. This SNP hyplotype was found in two Sc-3 resistant accessions (W_HC03 and W_HC20). Only two SNPs were significantly associated with Sc-7 resistance in IFS1 gene. The 1,902 bp site was found in three resistant accessions.For the IFS2 gene, only one 1 bp Indel was found significantly associated with Sc-3 resistance. Three SNPs, including two Indels were found significantly associated with Sc-7 resistance. All sites found in IFS2 gene were singletons.For the F3H gene, seven SNPs were found significantly associated with Sc-3 resistance. These seven sites consisted a SNP hyplotype:’GGACAAG’ and were found in 14 accessions,13 of which(93%) were resistant to Sc-3. Only one SNP was found significantly associated with Sc-7 resistance. The ’T’ format of this ’T/A’ mutant was found in 12 accessions, nine of which (75%) were susceptible.2. For aluminum toxicity toleranceSignificant sites found in IFS1 and IFS2 gene were all singletons. For F3H gene, one SNP was found significantly associated with aluminum toxicity tolerance. The ’G’ format of this ’G/T’ mutant was found in five accessions. These five accessions all showed low tolerance to aluminum toxicity. 3. For phosphorus deficiency toleranceOnly singletons were identified in the two IFS gene that were significantly associated to the phosphorus deficiency tolerance and no significant sites were identified in F3H gene.In summury, though studies of gene expression and enzyme activity are needed for further elucidation of the allelic effects of these polymorphisms, especially the singletons, these SNPs seemed to be possible causative polymorphisms for soybean seed isoflavone concentrations and stress tolerance, and may serve as important molecular markers for breeding. |