Font Size: a A A

Developing Methods And Software For Genetic Analysis Of Complex Traits

Posted on:2009-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YangFull Text:PDF
GTID:1103360242994298Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Understanding the genetic basis of complex trait is of key importance for genetic improvements of crops and domestic animals, and helpful to elucidate the genetic aetiology of human complex diseases. The essential issues in genetic analysis of complex trait is to map quantitative trait loci (QTLs) that affect the inheritance of complex trait, detect the interaction among QTLs (epistasis) and the differences of the effects of QTLs and epistasis in different environmental conditions, and consequently to identify the candidate genes underlying complex trait and their genetic regulatory network. Based on mixed linear model approaches, a full-QTL model and a two-step mapping strategy were proposed for linkage analysis of segregating populations to dissect the genetic architecture of complex trait into the effects of individual QTLs and epistatic interaction between pair-wise loci, and the interaction effects between QTLs (or epistasis) and environmental factors. Simulation study and analysis of rice and mice data were performed to validate reliability and efficiency of the proposed method. Subsequently, a novel genetic model that integrates the genetic information of both host and parasite was proposed to map disease-related QTLs on host and parasite genome simultaneously, as well as to investigate the interaction among these QTLs. In addition, the aforementioned method was extended for QTL analysis based on high-throughout genotyping technology (e.g. DArT), and an approach was developed to predict superior gentoypes utilizing the genetic information obtained from QTL analysis. Furthermore, combined with the gene expression and genotyping data of segregating population, a new approach was proposed for mapping expression QTLs (eQTLs) and detecting the epistatic interaction between a main-effect eQTL and any other loci. Finally, two software packages were develop to implement the aforementioned methodologies. The main features of the proposed methods and results are summarized as follows:1) For homogenous mapping panels, such as recombinant inbred (RI) and double-haploid (DH) populations, a full-QTL model was proposed to explore the genetic architecture of complex trait in multiple environments, which includes the additive effects of multiple QTLs, additive x additive epistatic effects, and their interaction effects with environments. A mapping strategy, including marker interval selection, detection of marker interval interactions, and genome scans, was used to evaluate the putative locations of multiple QTLs and their interactions. An F-statistic based on Henderson method III was used for hypothesis test. In each of the mapping procedures, permutation testing was exploited to control for genome-wide false positive rate, and model selection was used to reduce the ghost peaks in F-statistic profile. Parameters of the full-QTL model were estimated using a Bayesian method via Gibbs sampling. Monte Carlo simulations were conducted to illustrate the reliability and efficiency of the method. Two real datasets (BXD mouse olfactory bulb weight and rice yield), were used as worked examples to demonstrate the proposed methods.2) For heterogeneous mapping panels, such as F2 and recombinant inbred intercross (RIX, or say IF2) populations, the aforementioned full-QTL was extended to include the additive and dominance effects of QTLs, epistatic effects (additive x additive, additive x dominance, and dominance x dominance), and their interaction with environments. A series of simulations were conducted to investigate the powers and false discovery rates of QTL and epistasis with different RIX designs. Two real datasets, one from mouse and the other one from rice, were analyzed to illustrate the validity of the proposed method. Results revealed that more than a half number of QTLs show pleiotropic effects, while epistasis seems to be independent for different traits. The proportion of phenotype variation attributed by environmental effects differed considerably for different traits.3) The development of array-based high throughput genotyping methods (e.g. diversity arrays technology DArT and single nucleotide polymorphism SNP) created significant opportunities to increase the number of genetic populations for genetic linkage analysis. A strategy was proposed for mapping of QTLs based on the DArT genotyping system. A procedure was illustrated for constructing a consensus linkage map consisting of both DArT and SSR markers by utilizing a sub-group DH population, and a second linkage map constructed with SSR markers alone and a more extensive full DH population. Resistance to barley net type net blotch disease was analyzed using the sub-population data with the high-density consensus linkage map and the full-population data with the low-density SSR linkage map, respectively. Two interactive QTLs were detected either by the sub- or full-population. The results indicated that high density molecular markers, small population size and precise phenotyping could improve the precision of mapping major-effect QTLs and the efficiency of conducting QTL mapping experiment.4) In addition, methods were developed for predicting two kinds of superiorgenotypes (superior line and superior hybrid) based on QTL effects including epistatic and QTL x environment interaction effects. Mathematical formulae were derived for predicting the total genetic effect of any individual with known QTL genotype derived from the mapping population in a specific environment. Two algorithms, enumeration algorithm and stepwise tuning algorithm, were used to select the best multi-locus combination of all the putative QTLs. Grain weight per plant (GW) in rice was analyzed as a worked example to demonstrate the proposed methods. Results showed that the predicted superior lines and superior hybrids had great superiorities over the F1 hybrid, indicating large breeding potential remained for further improvement on GW. Results also indicated that epistatic effects and their interaction with environments largely contributed to the superiorities of the predicted superior lines and superior hybrids.5) Under a hypothesis that the host-parasite interaction system was governed by genome-for-genome interaction, we proposed a genetic model that integrates genetic information from both of the host and parasite genomes. The model could be used for mapping quantitative trait loci (QTLs) conferring the interaction between host and parasite and detecting interactions among these QTLs. A one-dimensional (1D) genome scan strategy was used to map QTLs in both of the host and parasite genomes simultaneously conditioned on selected pairs of markers controlling the background genetic variation; a two-dimensional genome scan procedure was conducted to search for epistasis within the host and parasite genomes and interspecific QTL×QTL interactions between the host and parasite genomes. Permutation test was adopted to calculate the empirical threshold for controling the experimental-wise false positive rate of detected QTLs and QTL×QTL interactions. Monte Carlo simulations were conducted to examine the reliability and efficiency of the proposed models and methods. Simulation results illustrated that our methods could provide reasonable estimates of the parameters and adequate powers for detecting QTLs and QTL×QTL interactions.6) A statistical procedure was proposed to identify the differentially expressed genes (DEGs) for gene expression data with or without missing observations from microarray experiment with one- or two-treatment factors. An F-statistic based Henderson method III was constructed to test the significance of differential expression for each gene under different treatment(s) levels. The cutoff P-value was adjusted to control the experimental-wise false discovery rate. A human acute leukemia dataset corrected from 38 Leukemia patients was re-analyzed by the present method. In comparison to the results from SAM (significant analysis of microarray) and MAANOVA (microarray analysis of variance), it was indicated that the present method has similar performance with MAANOVA for data with one-treatment factor, but MAANOVA can not directly handle missing data. A mouse brain dataset collected from six brain regions of two inbred strains (two-treatment factors) was re-analyzed to identify genes with distinct regional-specific expression patterns. The results showed that the proposed method could identify more distinct regional-specific expression patterns than the previous analysis of the same dataset.7) Considering the gene expression values as a special kind of complex "trait", a novel method was proposed to identify genetic polymorphisms (or say eQTLs) and their interaction that affect gene expression. The method started with a 1D genome scan procedure to search for eQTLs with individual effects conditioning on previously selected candidate markers to control the background genetic variation. After that, each main-effect eQTL was tested for the interaction effect with any other loci with or without individual effects. In the procedure of detecting main-effect eQTL or of detecting genetic interaction, the cutoff P-value was adjusted to control the experimental-wise false discovery rate. A mouse dataset collected from a group of RI strains derived from two ancestor parents, C57BL/6J and DBA/2J, was analyzed to illustrate the utility of the proposed method.8) Two software packages, QTLNetwork and QTModel, were developed by C++ programming language for implementation of the aforementioned methodologies. QTLNetwork was developed for mapping and visualizing the genetic architecture underlying complex traits for experimental populations in multiple environments. It can handle data from F2, backcross, recombinant inbred lines and double-haploid populations, RIX (or say IF2) and BCnFn populations. QTModel has three modules, mixed, array and diallel. The mixed module was developed for analyzing data from experimental designs with random factors, such as randomized block design, factorial design, multi-factor factorial design, nested design, and cross nested design etc. The array module has the capability of analyzing microarray expression data with one- or two- treatment factors for differentially expressed genes. The diallel module was developed for analyzing the data from classical diallel cross designs.
Keywords/Search Tags:Complex trait, Mixed linear model, Genetic architecture, Quantitative trait loci (QTLs), Epistasis, QTL×environment interaction, Superior genotype, Monte Carlo simulation, Host-parasite interaction, Microarray, Differentially expressed genes (DEGs)
PDF Full Text Request
Related items