Font Size: a A A

The Impact Of Rare Variants On Population Stratification Analysis

Posted on:2020-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Q MaFull Text:PDF
GTID:2370330602952185Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Genome-wide association studies(GWAS)are important analytical methods in genetic research.Population stratification is a known confounding factor in GWAS.Population stratification in samples of case-control studies can lead to false-positive or false-negative results,therefore population stratification analysis is one of the important analyses in GWAS.A large number of existing sequencing data show that vast rare variants are population-specific,which may be better to distinguish the population structure of the study sample.Principal component analysis(PCA)method is widely applied in the analysis of population structure with common variants.It is still controversial whether this method is effective when using rare variants to distinguish population stratification.In this thesis,we study the PCA method,describe the construction of genetic relationship matrix(GRM)from genotype data.Then,we derive the expected genetic relationship matrix(EGRM)by calculating the mathematical expectation of the GRM.The variance and covariance elements of the EGRM depend on the minor allele frequencies(MAFs)of genetic markers used in the PCA.With the decreasing of the MAFs of single nucleotide polymorphisms(SNPs),the intra-population covariances and inter-population covariance also decrease,the ability of distinguishing population structure decreases as well.Next,we use the 1000 Genome Project data to conduct the PCA of GRM,the scatter plots of the populatiton are drawn with R language.The results show that the percentage of variance explained by the first five principal components(PCs)is 17.09% when using the common variants whose MAFs are between 0.4 and 0.5,while the value is only 0.74% when using the rare variants with MAFs between 0.0001 and 0.01;secondly,PCA results using rare SNPs reveal different population structure from those of common SNPs and low-frequency SNPs.However,from distinguishing population structure point of view,rare variants are not as effective as common and low frequency variants.We further conduct eigendecomposition of the EGRM,we show that the information of population divergence is contained in K PCs,which is mainly contained in the largest K-1 PCs,where K is the number of populations.When MAF becomes small,the ratio between inter-population variance and intra-population variance in the K PCs decreases,which is not conducive to distinguishing populations.Based on the EGRM,we derive the distance among populations.When using rare SNPs,the distance decreases compared with using common ones.Therefore,we show analytically that the performance of PCA with rare variants is worse than that with common variants.The analysis of the 1000 Genome Project data verify our throretical results.The results show that the ratio of the inter-population variance to the intra-population and the population distance are 93.85 and 444.38 when using common variants with MAFs between 0.4 and 0.5,the ratio of two variances and population distance decrease to 1.83,and 17.83 with rare variants of MAFs between 0.0001 and 0.01.Although the results of PCAs using rare SNPs reveal different population structures from those of common SNPs,our theoretical and analysis results demonstrate that existing PCA methods can not effectively utilize the abundant genetic information contained in rare variants.
Keywords/Search Tags:rare variants, population stratification, PCA, GWAS, SNP
PDF Full Text Request
Related items