Font Size: a A A

A Modified Ant Colony Optimization Algorithm For Identifying Gene-gene Interactions

Posted on:2015-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:C QianFull Text:PDF
GTID:2284330467959540Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Nowadays it is widely believed that genetic factors take a critical role in the mechanism of human complex diseases. From year2005to the present, genome-wide association study (GWAS) has proven to be a popular and powerful means in identifying genetic susceptibility variants which were associated with human complex diseases. However, relatively a small proportion of the genetic heritability can be explained by those identified single nucleotide polymorphisms (SNPs). There are increasing evidences that neglecting gene-gene interactions may contribute to "missing heritability".Gene-gene interaction which is usually known as epistasis or epistatic interaction means that the effect of a specific genetic variation on phenotype is affected by other genetic variations. Taking into account the large number of SNPs in GWAS, the traditional methods for interaction analysis might lead to the well-know issue of multiple testing. After the multiple testing adjustment, the significant level will be too stringent to maintain the power of the methods. Recently, an optimization theory called ant colony optimization (ACO) algorithm has show its great potential in identifying gene-gene interactions. In this study, we propose a modified ACO algorithm, named AntTrailer, to investigate its advantages in detecting gene-gene interactions.In this study, simulations are conducted on simulated datasets to compare the statistical properties of AntTrailer and AntEpiSeeker in case-control studies, and we also apply the two methods to the dataset extracted from a real GWAS data of NSCLC (non-small cell lung cancer) in Han Chinese population. The main contents of this study read as follow:1. Simulations based on virtual datasets:Simulated datasets are generated under different LD structures and MAF of SNPs. Simulations based on the real gene:We generate simulated datasets based on the phased haplotypes of CEU (Utah residents with northern and western European ancestry) samples from the website of the International HapMap project. To compare the performances of AntTrailer and AntEpiSeeker, simulated datasets are applied to evaluating statistical properties in identifying interactions.2. The real data analysis:We apply the two methods to the dataset extracted from a real GWAS data of NSCLC in Han Chinese population. In discovery stage, we employ the two methods to identify the highly suspected interactions in Nanjing population. In replication stage, the highly suspected interactions are validated in Beijing population by using the traditional logistic regression.The main results of this study are as follow:1. Results of simulations based on virtual structure of datasets:AntTrailer has the ability of controlling the empirical type I error. But, type I error dramatically inflated for AntEpiSeeker, especially, when the SNPs have marginal effects. What is AntEpiSeeker detected is the joint effect, rather than interaction effect. The power of the two methods increases with rising MAF of SNPs. However, the power of the two methods varies inversely with LD structure of SNPs.2. Results of simulations based on the real structure of gene:Consistent with the results of simulations based on virtual structure datasets. AntTrailer has the ability of approximately controlling the empirical type I error. However, type I error dramatically inflated for AntEpiSeeker, especially, when the SNPs have marginal effects. In terms of power, the stronger the OR (odds ratio) for the interaction is, the higher the power of the two methods is. The power of AntEpiSeeker is sensitive to the SNPs which have marginal effects. By contrast, AntTrailer is fairly robust to the SNPs with marginal effects. 3. Results of the real data analysis:For the AntTrailer, a total10pairs of first-order interactions are selected (α=2.63E-04) in Nanjing population. Among them, two pairs of interactions are successfully validated (a=5.00E-04) in Beijing population. For the AntEpiSeeker, only two pairs of one-order interactions are selected in discovery stage, however, neither of them meets the criteria in replication stage.Conclusion:Results of simulation studies and real datasets analysis suggest that the modified ant colony optimization algorithm (named AntTrailer) is an efficient and powerful approach in identifying gene-gene interactions.
Keywords/Search Tags:Genome-wide association study, High dimensional data, gene-gene interaction, ant colony optimization algorithm, case-control study
PDF Full Text Request
Related items