Font Size: a A A

The Study Of Heterogeneity Theory And Quantitative Analysis Method Of The Human Population Spatial Genetic Structure

Posted on:2006-08-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:F Z XueFull Text:PDF
GTID:1100360155467096Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
It is one of the primary tasks to study human population genetic structure, which is the genome sum of all in the population, for human population genetics. The allelic frequencies on locus/loca in certain population are the population genetic structure; and the total is the whole genetic structure of the population. The geographical pattern and its changing rule of population genetic structure is defined as the human population spatial genetic structure, and the complexity and variability is defined as the "heterogeneity of population spatial genetic structure".At present, there are some limitations for the quantitative analysis method of human population spatial genetic structure heterogeneity in the following areas: the analysis of spatial characteristics of human population genetic structure and its spatial estimation, the analysis spatial relationship among population and gene-flowing track, the identification of genetic boundaries, and the separation of geographic isolation from the component of population spatial genetic structure. At present study, based on the theory of population genetics, the heterogeneity theory and quantitative analysis method of the human population spatial genetic structure are studied by combing the geostatistics, mathematic ecology, fractal (multifractal) theory, graph theory, with molecule genetics. The purposes of the study is to overcome the limitations above and to develop the analysis methods of human population spatial genetic structure. The whole dissertation includes the following six chapters:Chapter 1 The selection of loci and the data collection: based the different mutation and evolution mechanism of different loci, the gene frequencies data of CCR2, SDF-1, ABO, HLA-A,B, TPOX, FGA, CSF1PO, D7S820, THO1, and VWA locus are collected, and their spatial genetic databases are set up respectively.Chapter 2 The theoretical models of population genetic structure and their measurements As preparation knowledge, the theoretical models of population genetic structure(such as island model, stepping stone model and distance isolation model) are introduced, and the measurements of population genetic structure (gene frequency and its variance-covariance, heterozygosity and genetic diversity, genetic differentiation and fixation index, genetic distance) are also introduced. These knowledges are essential to understand the concept and models of human population spatial genetic structure.Chapter 3 The classical multivariate statistical models in human population spatial genetic structure analysis and their improvement Based on the characteristics of gene frequencies matrix, the problems of classical multivariate statistical models in human population spatial genetic structure analysis are explored. Furthermore, in order to useclassical multivariate statistical models to analysis of human population spatial genetic structure accurately, the problems are overcome by improving the classical multivariate models: (a) As the gene frequencies matrix has the character of "closing data" with "closing effect", which resultes in great difficulties in analyzing the population genetic structure, the "logratios" nonlinear multivariate statistical model is set up. The model is a better method for analyzing the population genetic structure, (b) The differences of the Eigenvalues, Eigenvectors, and their effect in reducing the dimensionality between the standardized correlation matrix principle component analysis and the averaged covariance matrix principle component analysis are displayed. It is concluded that one should calculate the principle component scores using averaged covariance matrix rather than standardized correlation matrix in the principle component analysis of human population genetic structure.(c) Based on the Eigenvalues with weighted characteristic and the Eigenvectors with direction in principle component analysis, the measurement of synthetic principle component (SPC) is defined to measure the population genetic structure, (d) The graph theory multivariate statistic models are set up by combining the method of multivariate statistics with the minimal spanning tree of graph theory, The graph theory multivariate statistic classification scallergram can show not only the genetic structure of populations, but also the intrinsic relationship among the populations, they are ideality method for studying the human population genetic structure, (e) The reasons of "horse-shoe effect" in correspondence analysis for analyzing human population genetic structure are explained, and the detrended correspondence analysis(DCA) can avoid the "horse-shoe effect", (f) The PPG biplot is created by using the technique of singular values decomposition (SVD), and is based on the theory of population subdivision model. It has some advantages over other conventional multivariate methods for analyzing the human population genetic structure.Firstly, it is more genetic interpretative; Secondly, it shows graphical presentation of the gene frequencies matrix, which greatly enhances our ability to understand the population genetic structure of the locus (loci).Chapter 4 The hypothesis and outline of human population spatial genetic structural model: On the basis of the theory of gene frequency stochastic process, combining the geostatistics with the population genetics, the regionalized variable theory of gene frequency spatial distribution is presented and the gene regionalized variable is defined. The hypothesis of stationarity is set up and the outline of human population spatial genetic structural model is displayed.Chapter 5 Spatial heterogeneity models of human population genetic structure: Based on the spatial heterogeneity theory, the concept of human population spatial genetic structural heterogeneity is presented, the step by step of spatial heterogeneity models of human population genetic structure and their applications in population genetics are showed. The step by step of spatial heterogeneity models of human population genetic structure: (A) The spatial semivariogram models of human population genetic structure.Combining geographic distance with genetic distance closely, the spatial semivariogram models based the genetic distance ( Nei's (1972), Gregourius (1978), and Gsr), syntheticgenetic measurement {SGM), and unique genetic variable are set up respectively. The modelsdisplay the changing trend of average genetic distance with the changes of spatial distance.(B) The theory spatial semivariogram models of human population genetic structureand their heterogeneity analysis. The types, fitted method, and step by step ofheterogeneity analysis of theory spatial semivariogram models of human population geneticstructure are presented respectively. The author emphatically discussed how to usesemivariogram to decompose human population spatial genetic structure into quantifiablecomponents. From decomposing process, the population genetic parameters to quantifydegrees and scales of spatial heterogeneity of human population genetic structure aredefined. Biases of the method are also presented by taking the examples in the analysis ofhuman population spatial genetic structure. (C) The spatial estimating models of humanpopulation spatial genetic structure. The Kriging model estimating models of humanpopulation spatial genetic structure has several advantages over other interpolation andsmoothing methods. Firstly, it relies on the structure of the spatial genetic semivariogrammodel, which can be used to quantify the spatial genetic heterogeneity of the locus (loci)before mapping its spatial genetic structure. Secondly, it is virtually unbiased in theinterpolation situation, where the location to be estimated is surrounded by data on all sidesand is influenced within the range of these data. Thirdly, it allows of the estimative error ofinterpolation, which can be used to appraise the predicting effect for the spatial estimation,and the error maps can be used to decide where to introduce new sampling populationgenetic data. However, the "Kriging" model also has some disadvantages. Firstly, when thetheoretical spatial genetic semivariogram can not be fitted by any models, the "Kriging"model can not be set up. Secondly, if the "Kriging "model was built by a poor spatialgenetic semivariogram, the "Kriging" estimation standard deviation is remarkably high inthe whole area, hence the "Kriging" model can not be suitable to estimating the distributionof spatial genetic structure. In these situations, the interpolation algorithm, whoseassumption is spatial random rather than spatial autocorrelation, such as the Cavalli-Sforzamethod in Genography, inverse distance-weighted methods, splines, should be used toestimate or map the distribution of spatial genetic structure.Applications: (A) The population spatial genetic structure heterogeneity and their spatial estimations of eight loci (ABO, HLA-A, TPOX, FGA, CSF1PO, D7S820, TH01, and VWA locus) in Chinese populations: (a) Although the eight loci have different mutation and evolution mechanism, their spatial distribution all show same trend that the differences of the SPC among populations in different districts of the same nationality are greater than that among different nationalities of the same district. The differences of SPC seem to be more geographical rather than nationalistic and the geographical distribution is mainly latitudinal, (b) The genetic dine directions of differentlocus are different, which indicates that different loci have different natural selection directions, (c) From the SPC maps of eight loci, a same genetic geographic pattern can be identified in Xinjaing province, which presents that the populations in this area have the highest component of Gaucasoid blood. (B) The spatial genetic structure of two HIV-1-resistant polymorphisms (CCR2 and SDF1) loci in the population of Shandong province, China, (a) There are significantly spatial genetic structures of the two alleles at different spatial distance classes on the scale of populations, but on the scale of individuals, no spatial structure is found in either the whole area of Shandong province or the area of each sampled county, (b) Although the change of frequencies of the two alleles with geographic locations in Shandong province both showe gradual increase trends, their changing directions are inverse. The frequency of CCR2-64I allele gradually increases from the southwest to the northeast, while frequency of SDF1-3'A allele gradually increased from the northeast to the southwest, (c) The RR to AIDS of combined types of their different genotypes does not represent obvious geographic diversity on the whole area of the province. Evaluating spatial distribution of the genetic susceptibility of HTV (ADDS) to CCR2-64I and SDF1-3'A alleles, should focus on the frequencies of combined genotypes of CCR2 and SDF1 based on the two-locus genotype of each individual rather than the frequencies of CCR2-64I and SDF1-3'A alleles.Chapter 6 The advanced model of human population spatial genetic structure: In this chapter, three advanced model of human population spatial genetic structure are presented.(A) The 2-D graphic clustering model of human population spatial genetic structure. Combing the graph theory, conditional cluster analysis with panbiogeography, the 2-D graphic clustering model of human population spatial genetic structure is built, in which the spatial genetic significance of the minimum spanning tree( MST) is that it can be used to display the genetic similarity between populations within the geographic area, present the patterns of population spatial genetic structure, show the spatial genetic relationship between populations, and deduce migration track of populations. The seven MSTs, which are constructed from HLA, TPOX, FGA, CSFIPO, D7S820, TH01 and VWA locus in Chinese populations across the land of China, indicate that (a) Although there is some nuance in the form and classifying result of different locus, it is similar in the form and the direction to which the offset goes. This shows the gene flowing track among the north and south populations in China and the order and direction of population hereditary differentiation. (b).The MST on each locus shows that the minority in Xinjiang province is in its own system. This explains that the genetic structure of Caucasian and Mongoloids is still in its own system, and the gene intercourse is little, though there had been gene flow, population migration, and population mixture between them; The Tibetan population in Tibet have genetic relation with the minority populations in Yunan, and has no genetic relation with Caucasian blood in Xinjiang province and the Mongoloids blood in north. The minority in Yunnan province can be divided into two relatively dependence sub-populationsin north and south of the.province.(c) The disunity of classification results of populations in middle and east of China by the MSTs on each locus explains that there had been frequently gene intercourse between the Mongoloids in north and south.(B) The multifractal models of human population spatial genetic structure, (a) The semivariogram multifractal model of human population spatial genetic structure: The fractal D in the model can be used to measure the heterogeneity of human population spatial genetic structure, (b) The "Contour-Area" multifractal model of human population genetic structure: The "Contour-Area" multifractal model of human population genetic structure is set up, which can meet both the purposes on the basis of genetic variable distribution as well as spatial and geometrical properties of genetic patterns. It can improve the conventional interpolated methods to make them display the spatial genetic boundaries by avoiding their arbitrary thresholds marked on the maps. The "Contour-Area" multifractal model is built by establishing power-law relationships between the area A(>p) with the values of p in the contour map of genetic variable greater than p and value p itself after plotting these values on log-log paper. A series of straight-line segments can be fitted to the points on the log-log paper, each representing a power-law relationship between the area A(ï¿¡p) and the cutoff genetic variable value for p in a particular range. These straight-line segments can yield a group of cutoff values, which can be identified as the genetic boundaries, on the basis of which the spatial genetic structure can be classified into discrete genetic zones. These genetic zones usually correspond to the inherent regulations of spatial genetic structure on the landscape, because of the genetic boundaries being marked on the map. As examples, the spatial genetic structures of three loci ( ABO, HLA-A, and' TPOX) in China are analyzed by using the "contour-area" multifractal model, which draws satisfactory conclusions.(C) The canonical trend surface model of human population spatial genetic structure. Combining canonical correlation analysis with trend surface analysis, the canonical trend surface model of human population spatial genetic structure is set up. This model can be used to anatomize the spatial constitute of population genetic structure and separate the geographic isolated regions that have population genetic meaning. The analysis results of spatial genetic struvture of ABO, HLA, and TH01 locus in Chinese populations across the land of China indicates that the number of sampling data points and their spatial distributing pattern can work on the effect of the model: if your objective is mainly to display the genetic cline trend of spatial genetic structure, the number of sampling data points could be reduced; on the contrary, if your objective is mainly to separate geographic isolated regions of the total spatial gentic structure, you must have enough sampling data points and their spatial distribution should be regular in the study area.
Keywords/Search Tags:human population genetics, human population spatial genetic structure, spatial heterogeneity, mathematic model
PDF Full Text Request
Related items