| Genome-wide association studies (GWAS) is an examination of common genetic variants in different individuals to see if any variant is associated with a trait or disease, which typically focus on identifying whether individual single nucleotide polymorphisms (SNPs) have a strong association with the phenotype by applying statistical methods. Hundreds of genome-wide association studies (GWAS) for complex human traits were completed over the last decade. However, the genetic variants discovered account for only a small proportion of the heritability of complex disease. One possible reason is that most analysis methods test the association of the phenotype with each SNP individually, which is not well powerful for detecting multiple variants with small effects. Complex diseases are caused by disease risk genes in the form of biological pathways in gene networks. The complexity of these diseases can be interpreted by their multiple gene products and the cooperative behavior of specific disease-risk pathways in gene networks. Many studies have demonstrated that targeting disease-associated pathways provides additional insights into the mechanisms of disease.Network inference is a challenging task as the problem itself is of a combinatorial nature. Proper methods should be proposed to render a priori topological structure for establishing gene networks of the disease. However, in constructing topological structure of genetic network, how to identify the relationship between any two nodes statistically remains unsolved. The essential task is to test the statistic significance link between any two nodes contributing to the disease in the network. Intuitively, this problem can be solved by testing the interaction or co-association between two genes. In epidemiology, gene-gene interaction refers to the extent to which the joint effect of two genes on disease differs from the independent effects of each of the genes, which implies that the interaction between two genes assumes independence between them. However, two genes in the genome are often correlated with each other in specific pathways or networks to cause a disease. Co-association between genes is a joint effect of genes contributing to the disease or trait. The measurement of the co-association between genes is based on the correlation between genes, and this would be more appropriate to measure the contribution to a disease by the two genes’ joint effect.In this paper, we proposed an effective statistic that combines the case and control data to estimate the co-association between two genes. The statistic was obtained by the difference of path coefficient between cases and controls based on Partial Least Squares Model. Through simulation study and real data analysis, we have concluded as following:(1) We have demonstrated the inherent relation between gene-gene interaction and gene-gene co-association. The concept of gene-gene interaction or gene-gene correlation will be extended by gene-gene co-association. Gene-gene co-association is much more meaningful as it could render a priori model for establishing pathways or networks between genes to the disease.(2) The advantage of PLSPM-based statistic are:①Compared with SNP-based methods, PLSPM-based statistic would greatly reduce the number of possible two-locus interactions and may aid in the interpretation of the results. The dimensions of the genotyped data are substantially reduced with the PLSPM-based method, which will somewhat reduce the heavy calculation burdens and the multiple correction problem.②Compared with haplotype-based methods, PLSPM-based statistic effectively reduced freedom, avoided haplotype inferential problem and improved detecting power.③Compared with CCA-based methods, PLSPM-based statistic was designed to extract information from genetic data. Reflective mode in PLSPM is more adept to handling the high-dimensional aspect of genomic data and the multicollinearity between manifest variables belonging to the same block.(3) The power of the PLSPM-based statistic ended up being a measure of co-association and had much higher power than logistic regression test at given significant level, sample sizes, odds ratios and different causal variants. Real data results showed the P-values based on PLSPM method were smaller than those based on logistic regression analysis. |