Font Size: a A A

Statistical Genetics-Based Approach To Allelic Functional Differences And Its Application

Posted on:2010-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:H H KanFull Text:PDF
GTID:2120360275496373Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Nucleotide sequence variation is an inevitable result of biological evolution and the adaptation to the environment. Allelic variation can potentially alter protein function if it occurs in the coding region. Alternatively, regulatory allelic variants can affect the level of gene expression. Both types of allelic variation may produce functional changes and affect the phenotype of an organism. Many important phenotypes in crops are quantitative traits which are simultaneously controlled by polygenes and environmental variants. Moreover, nucleotide sequence variation of a gene widely exists among varieties. So how to establish the relationship between allelic differences and the resultant quantitative trait phenotype is an issue which is difficult and subject to further study. This means that we must pick up those loci with the nucleotide sequence variations which are associated with the quantitative traits and make sure the functional changes of the different alleles within locus. From the statistical point of view, if the target phenotype is considered as independent variable and allelic variation in many genome loci as explanatory variables, we need to pick up the variables with significant contribution to the phenotype from tens of thousands of variables. This is so-called variable selection from oversaturated models. There has been many methods to deal with this issue. However, Xu (2007) proposed an empirical Bayes method that requires no Markov chain Monte Carlo samplings, and it appears to outperform all other methods. In this paper, we first introduced E-bayes method to the analysis of allelic functional differences. The feasibility was also verified so that gene discovery and allele mining could be technically realized by using crop germplasm resources. In addition, a real data set was used to demonstrate the application of E-bayes. The main results were showed as follows.I.Simulation study. The simulated data of an oversaturated genetic model including 50 main effects and 50(50-1)/2=1225 epistatic effects were analyzed by E-bayes. Four of the 50 main effects and four out of the 1225 possible interaction effects were randomly assigned. Factors considered in the simulations include: number of varieties, polymorphism information content (PIC) and the whole contribution of candidate genes. Number of varieties was set at 4 levels: A1=30, A2=50, A3=70 and A4=100. PIC was set at 5 levels: B1=0.1638, B2=0.2638, B3=0.3318, B4=0.3648 and B5=0.3750. The whole contribution of candidate genes was set at 3 levels: C1=30%, C2=50% and C3=70%. The total 60 treatment combinations were simulated and each one was repeated 100 times. The principal statistical properties to be investigated include statistical power, precision and accuracy of estimates for candidate genes. Simulation results showed that, more number of varieties, larger PIC and higher contribution tended to produce higher statistical power. When the whole contribution of candidate genes was low such as 30%, the number of varieties usually need to more than 100 so that the statistical power of candidate genes could be more than 80%. But, when the contribution was 70%, even if PIC was low, less number of varieties could make more than 80% statistical power. When E-bayes method was used to the treatment combination A4B3C2, only one out of the eight effects had low statistical power while all other effects had almost perfect statistical power and precise and accurate estimates of candidate gene effects. Association analysis could only be applied to analyze main effects and the statistical power was lower than that of E-bayes method. Stepwise and LASSO could pick up all of the eight effects and showed high statistical power, but precision and accuracy of estimates for candidate genes were lower and more spurious genes were detected. PENAL performed well with relatively short computing time, but only four larger effects were found, the statistical power was low and the precision and accuracy were unsatisfactory. II.Real data analysis. E-bayes method was applied to analyze the 43 marker genotypes subjected to 18 starch-synthesizing genes and the pasting temperature (PT) phenotype in 118 rice varieties. The results showed that four markers were detected to be significantly associated with the phenotype. Pul-1 and SSII3-3 had main effects as well as interaction effect on PT. But AGPlar-1 and SSII3-2 just existed in interaction effect. The size of genetic effects could be negative or positive in which the maximum effect was twice of the minimum.
Keywords/Search Tags:Allelic variation, Quantitative trait, Oversaturated linear model, Variable selection, E-bayes
PDF Full Text Request
Related items