During the past decade, findings of genome-wide association studies(GWAS) improved our knowledge and understanding of disease genetics, and GWAS have played a central role in the discovery of genotype-phenotype association, geneticists detect these associations base on DNA polymorphism markers. Single Nucleotide Polymorphism(SNPs) is one of the most popular classes of genetic markers, can be used for disease genes and potential biological mechanism. To date, most genetic association studies have used a single-locus analysis strategy, in which each variant is tested individually for association with a specific phenotype. However, this strategy has been less successful for complex traits, such as diabetes, hypertension and asthma. This is because the single-locus analysis ignores the existence of epistasis, where loci affect the disease only by their interaction with other locus, whereas main effects of the individual loci may be small or absent. This phenomenon is also known as the "the missing heritability". As mounting evidence has suggested that epistasis is a ubiquitous component in the complex human disease etiology, and epistasis play a crucial role in many Genetic Control.With the appearance of high-throughput sequencing technology, researchers can detect epistasis in a genome-wide scope, and reveal the genetic mechanism of complex disease. The first set of difficulties and challenges which researchers faced to detect epistasis in genome-wise scope is computational burden. In this study, we use a prescreening method based on Mixed Random Forest(MRF), to select the best candidate dataset, and then use the MDR to detect epistasis algorithm in the candidate dataset. We validate our model in the additive model, the multiplicative model, the threshold model and the pure model. The experimental results show that this method has some practical significance.We aimed at developing a permutation-bases methodology relying on a machine learning method,Gradient Boosting Machine,to detect the pure epistasis model. Our approach called permuted gradient boosting machine(p GBM) which identified the top interacting single nucleotide polymorphism(SNP) pairs by estimating how much the power of a gradient boosting machine classification model is influenced by removing pairwise interactions. The mean AUC was used to define the strength of the interaction exists, and we can extend the model to the unbalanced data sets. When the heritability values is greater than 0.01 in the experimental verification, the detection capability about p GBM can achieve hundred percent. And the values is less than 0.01, its detection capabilities much higher than the p RF algorithm. Thought CPU using parallel computing model can enhance the computing speed, shorten the calculation time. The calculation speed of p GBM is 4.78 times faster than that of p RF when have six CPU. This methodology showed great potential to be used for detecting gene-gene interactions to study underlying genetic architectures in a scale-free way, which could be benefit to uncover the complex disease mechanisms. |