Font Size: a A A

A Restricted Two-stage Approach For Genome-wide Association Studies In Inbreeding Crops And Its Application And Software Development

Posted on:2015-01-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:J B HeFull Text:PDF
GTID:1313330512971005Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Genome-wide association studies(GWAS)were initially proposed in human aiming at unraveling genetic basis of complex diseases based on linkage disequilibrium(LD)by performing statistical tests for association between genome-wide marker and the phenotype and have been successfully used for genetic dissection of quantitative traits in both animals and plants,playing an important role in genetics and breeding studies.Self-pollination and often cross pollination are two major mating systems of crops and natural populations of inbreeding crops typically deviate from random mating population seriously as a result of high selfing rate.The long linkage disequilibrium(LD)decay distance in inbreeding population largely reduces the accuracy of GWAS as the marker even far away(within decay distance)from the causal locus could be recognized statistically as a significant marker,thus selection based on such markers may be invalid or entirely wrong.Cross design and progeny selection are the two major steps in conventional breeding and full information of genetic system is required for accurate optimal cross design and efficient marker-assisted selection(MAS).Although GWAS have been widely used for genetic dissection of agronomic traits in large germplasm population,but they are mainly used for discovering major genes and a very conservative significance level and strict correction for population structure were usually chosen to avoid false positives as much as possible.As a result,the high false negative rate and low genetic contribution explained have limited the application of GWAS to crop breeding.Conventional breeding is basically a procedure of genetic operation which assembles complementary alleles from parental materials into an improved composite individual genetically composed of elite alleles in productivity,quality or other required traits.However,the single nucleotide polymorphism(SNP)marker that widely used in GWAS is biallelic,thus multiple alleles was generally ignored in almost all GWAS.In a large-scale germplasm population,there must be multiple alleles and multi-allelic marker based analysis may be more appropriate and powerful.The present thesis was aimed at to explore an improved GWAS procedure to meet the requirements of multiple alleles in the natural germplasm population and full genome genetic loci information for optimal cross design as well as progeny selection which fits the conventional breeding of inbreeding crops.Based on genomic-wide SNPLDB(SNP LD block)markers built on SNP haplotype blocks and two-stage GWAS strategy under multi-locus model,genome-wide association analysis method was established for inbreeding crops.Simulation studies based on real soybean genotype data were performed to evaluate the reliability of the proposed method.In addition,GWAS and optimal cross design of 100-seed weight in soybean were performed to validate efficiency of the proposed method and used as worked example to demonstrate the proposed method.Finally,software package implementing the proposed method was developed with focus on GWAS and optimal cross design in inbreeding crops.The main results are summarized as follows:(1)Simulation study of LD and GWAS power in natural population with varying degree of inbreeding:LD was evaluated for natural population with different level of inbreeding by forward-time population simulation,and theoretical statistical power of GWAS for different heritability,level of LD and significance level was simulated.Results of LD simulation showed that LD decayed rapidly in outbreeding population and the distance at which LD decayed to 0.5 is less than 100 kb.LD decay distance in highly inbred nature population is generally larger than 500 kb,and even beyond 2 Mb for natural population with ultra-high level of inbreeding(selfing rate>95%).Results of GWAS power simulation showed that false positive rate of GWAS is much higher in inbreeding population than outbreeding population as the power for causal loci(h2 = 5%)detection at a typed marker in LD(r2=0.5)with the causal loci is about 68%for significance level of 5X10-6.Furthermore,false positive rate is much higher for large-effect loci(h2>5%)than small-effect loci,and increasing the significance level could reduce false positive rate,but high negative rate arises for small-effect loci resulting underpowered GWAS.(2)Evaluation of LD and construction of SNPLDB markers in Chinese soybean germplasm population:LD was evaluated in Chinese soybean germplasm population(CSGP)composed of 1024 accessions based on 145558 genomic-wide SNP markers,results showed that there is long range of LD in CSGP and the distance at which r2 decayed to half of its maximum is roughly 500 kb,indicating haplotype block structure.Based on definition of haplotype blocks of SNPs,36952 SNPLDB markers were constructed.Comparison of LD estimated by SNP and SNPLDB marker showed that the distance at which LD decayed to 0.6 is 3 Mb for SNP and 500 kb for SNPLDB,indicating the LD decay distance of CSGP was shortened by using SNPLDB markers.Furthermore,the allele number of each SNPLDB ranged from 2 to 14,providing more diversity allelic information than SNP marker.(3)Restricted two-stage genome-wide association analysis procedure for inbreeding crops:An approach to population structure based on genetic similarity coefficient was proposed for bias adjustment in GWAS,where a number of eigenvectors of genetic similarity matrix with largest eigenvalues were incorporated as covariates.Simulation studies based on genome-wide SNPs in CSGP showed the proposed method largely reduced false positives in GWAS,and along with the increasing in number of eigenvectors,the false positives decreased rapidly with a slight loss in power.An improved GWAS procedure was proposed to explore full genetic loci with maximum genetic contribution for its application to conventional breeding in inbreeding crops.A two-stage strategy was employed where candidate loci identified in the first stage with single locus model were used to build the final multi-locus model through stepwise regression in the second stage.In both stage,model bias introduced by inbreeding was adjusted by including eigenvectors of pairwise similarity coefficient matrix as covariate.Simulation results based on genome-wide SNPLDBs in CSGP showed that the proposed method has a great potential for identifying more causal loci,with 10%?17%more loci detected than existing method at a comparable level of false discovery rate.(4)Design of molecular breeding of 100-seed weight of soybean based on GWAS:Using the proposed restricted two-stage association mapping method,GWAS and optimal cross prediction of 100-seed weight in CSGP were performed as a worked example.A total of 139 SNPLDBs were identified to be significantly associated with 100-seed weight and explained 98.17%of the phenotypic variance with 0.57%?2.75%for each locus.Based on the QTL-allele matrix of 100-seed weight,optimal crosses were identified with a potential of 23.32%?32.43%increasing in 100-seed weight.(5)RTS-GWAS:software developed for genome-wide association analysis and optimal cross design for conventional breeding in inbreeding crops:Software package was developed by C++ programming language for mapping genetic loci and optimal cross prediction in inbreeding crops based on our restricted two-stage genome-wide association analysis procedure.It can handle any type of marker and perform large-scale GWAS with reasonable computational time.It has both easy to use graphical user interface(GUI)and command-line interface(CLI),and runs on Windows,Linux and OS X.
Keywords/Search Tags:inbreeding population, genome-wide association study, restricted twostage approach, Chinese soybean germplasm, breeding by genetic design
PDF Full Text Request
Related items