| [Background]Genome wide association study (GWAS) is to find the genetic loci related to disease. It not only plays an important role in helping human understand the genetic mechanisms of complex diseases, but also provides new technical methods for the successful prevention and treatment of complex diseases. However, the analysis results of GWAS are far away from the expectation and the genetic variants identified by GWAS can only account for a small proportion of the genetic characteristics for the most complex disease, referring to "missing heritability" problem. Possible explanations for this problem include gene-gene joint effects, rare variation, underestimation of the effects of alleles identified, the possibility that inherited epigenetic factors lead to resemblance between relatives and possible overestimation of heritability of the interested complex disease or traits. Among them, the existence of gene-gene interaction is a reasonable explanation to missing heritability, and it is important to the understanding and development of disease. If the gene-gene interaction is ignored, the effects of genetic variants will not be described accurately.In epidemiology, gene-gene interaction can be understood as the joint effects of two independent genes to the disease or phenotype. In statistics, the product terms in logistic regression models are commonly used to represent the traditional multiplied interaction. This regression model implies the independence assumption between geneA and gene B. In fact, most diseases are caused by multiple genes acting together through pathways or network where genes (or SNPs) are often correlated rather than independent. Hence, our group proposed a new concept-the gene-gene co-association, which could be defined as the joint effects of the two genes associated with the disease, meaning the summation of the interaction between two independent genes and association effect between two correlated genes. Generally, genes tend to work collaboratively within specific pathway or network which is associated with certain disease and the disease-associated interacting locus are often highly correlated. In this context, gene-gene co-association is more suitable to cope with the missing heritability problem comparing with the gene-gene interaction. Besides, testing the co-association of two genes can, to some extent, guide us to learn and construct genetic network structures. It is the key of developing methods for detecting gene-gene co-association.In order to test the co-association of two whole genes, several methods have been proposed by our group, such as the statistics based on SNP-level Fisher r-to-z transformation, canonical correlations (CCU), kernel canonical correlation analysis (KCCU) and partial least squares path modeling (PLSPM). However, the existing methods can not effectively test the co-association of two whole genes, and they have limitations in computing speed and power to some extent. Therefore, developing powerful and efficient gene-based methods to test gene-gene co-association is highly desirable.[Objective]In this study, we aimed to develop a powerful score-based test statistic to identify co-association at gene or region level.[Methods]The statistic aimed to essentially capture the effect of covariance matrix between two genes on disease to detect the gene-gene co-association. Theoretical conduction, statistical simulations and real data analysis were conducted to assess its stability and effectiveness. In order to evaluate the performance of the method more comprehensively, several commonly used methods were applied to simulations and real data analysis, such as single SNP-based logistic regression model (SNP-LRT), principle component analysis (PCA)-based logistic regression model (PCA-LRT), the delta-square (δ2) statistic, the Least Absolute Shrinkage and Selection operator (LASSO), the CCU statistic, the KCCU statistic and the PLSPM-based statistic.[Conclusions]1. We have summarized statistical methods for detecting gene-gene co-association, and pointed out that the necessity of analyzing the gene-gene co-association from the whole gene level instead of simply analyzing the co-association of single SNP pairs.2. Based on score test, we proposed a powerful score-based test statistic to identify co-association at gene or region level. (1) Comparing with several nonparametric statistical methods for detecting gene-gene co-association, SBS statistic theoretically has rigorous asymptotic distribution under the null hypothesis. So it avoids using the bootstrap or permutation tests in the hypothesis test process leading to improving the computing speed and is more appropriate to detect gene-gene co-association. (2) Comparing with SNP-based test methods, SBS statistic can capture the internal linkage disequilibrium (LD) structure in each gene and utilize the information between multiple SNPs, which is consistent with the principles of genetics, but also facilitate the understanding of gene function. Meanwhile, it can avoid the multicollinearity problem between SNPs. (3) Compared to the other methods, SBS statistic not only captures the linear information in each gene, but also uses the nonlinear structure information, with the higher power.3ã€Statistical simulations and real data analysis indicate:1) The type I error rates of SBS are close to the given nominal level 0.05 under different sample sizes, main effect pairs and correlation structure, showing the better stability.2) The power of SBS statistic increases monotonically as the interaction effects and sample sizes increase. With the change of correlation structure in two genes, the power of SBS shows relatively higher power than the others.3) In the real data analysis of rheumatoid arthritis and coronary artery disease, SBS detected gene-gene co-association quickly and accurately, showing its superior practicality. |