| Complex disease known as multi-factor disease is controlled by multiple genes, presenting familial assembling tendency but not coinciding with Mendel's law.Such as cardiovascular disease,non-insulin-dependent diabetes,essential hypertension,obesity and most of all malignant diseases are complex diseases.Along with completement of Human Genome Project,An increasing number of bio-medical researchers have shifted focus from single-gene disease research to complex diseases which have the greatest impact on human health and social development.The onset of complex disease is the result of complicated interaction network between genes and between gene and environment,while every single gene gives very little effect.Traditional statistical methods for complex disease gene-mapping study are mostly based on the level of linkage disequilibrium between tagging SNP and disease locus in population-based case-control study.Another way for studying pathogenesis of complex diseases is based on gene-gene interaction,but these methods are either detecting for SNP-SNP interaction or haplotype study.But single SNP is poor to represent the information of the complete disease gene(Unless the selected tagging SNP is the very disease locus, but this case is quite scarce.),so the two ways for complex disease study are not so convictively as they intermingle problems such as multiple correction,uncertainty of haplotype inference,lower power and so on.In order to solve these problems,we developed the definition of 'whole gene' in this article.The 'whole gene' is related to the biological interpretation of the gene as a complete functional unit,not a single tagging SNP of the gene.The 'whole gene' is not only the important tool for complex disease gene mapping study,but also the basis for constructing genetic interaction network.In this article,based on the definition of 'whole gene',we proposed the Principle Component Analysis-Bootstrap Confidence Interval Test(PCA-BCIT) for detecting association between 'whole gene' and disease susceptibility,and Canonical Correlation Analysis-based U Statistic(CCA-U statistic) for detecting interaction between whole genes.It contains two chapters in this article,and the main research results are as follows:Chapter 1 Principle Component Analysis-Bootstrap Confidence Interval Test (PCA-BCIT) for detecting association between 'whole gene' and disease susceptibility:In this article,we had developed four different principle component(PC) extracting strategies based on case-control study,these are:①Seperately extracting PC equations from case group and control group(SES).②Extracting PC equations based on combined case-control group(CES).③Extracting PC equations based on case group(CAES),and applying the factor loadings to control group.④Extracting PC equations based on control group(COES),and applying the factor loadings to case group.Based on principle component analysis,bootstrap technology and confidence interval test,we further developed the PCA-BCIT method for detecting association between 'whole gene' and disease susceptibility.The main results are:(1) Compared with PCA-LRT method and Armitage trend test,the result of PCA-BCIT was much more credible and more powerful.(2) For the four different PC extracting strategies:①SES is a poor-power strategy,and inapplicable to real data analysis.②CES,CAES and COES are more powerful than SES extracting strategy.③COES extracting strategy is more consignable than CES and CAES as it is more coincide with the principles of case-control study.(3) PCA-BCIT is more applicable in practice compared with PCA-LRT and other traditional methods,and is deserve further applying in complex disease association study.Especially for researchers who were focus on analying association between 'whole gene' and disease susceptibility,and constructing genetic interaction network,this method is more suitable.Chapter 2 Canonical Correlation Analysis-based U Statistic(CCA-U statistic) for detecting interaction between whole genes:In this article,we developed the definition of 'whole gene',and demonstrated the inherent relation between biological interaction and statistical interaction.And based canonical correlation analysis,we developed CCA-U statistic for detecting interaction between whole genes.Through simulation study and real data analysis,we have concluded as following:(1) CCA-U statistic is approximately distributed as Normal distribution,and is a powerful statistical method for detecting interaction between whole genes.(2) The advantages of CCA-U statistic are:①Compared with traditional Fisher' s methods,CCA-U statistic had a more close biological interpretation,as it is based on canonical correlation analysis and focused on detecting interaction between two whole genes,but not simply treated gene-gene interaction as the residual from main effects of the two genes in a additive model.②Compared with Multifactor Dimensionality Reduction method(MDR),logistic regression model,Linkage Disequilibrium-based statistics(LD-based statistics) and entropy-based statistics,CCA-U statistic is not detecting single SNP-SNP interaction, but detecting interaction between whole genes,so based on the statistic it is better for constructing genetic interaction network.③Compared with logistic regression model, LD-based statistics and entropy-based statistics,CCA-U statistic recurred to canonical correlation analysis for extracting systematic information of whole gene,effectively reduced freedom,avoided multiple-correction problem and improved detecting power.④Compared with LD-based statistics,CCA-U statistic is based on whole gene which is a complete functional unit,so it is needless to consider confounding effect of linkage disequilibrium between SNPs.(3)Applying conditions of CCA-U statistic:①Generally, when interaction measure between whole genes(r_D -r_C) is larger than 0.1,the detecting result of CCA-U statistic is credible with sample size less than 1000.②When canonical correlation coefficients between whole genes in case group and control group are both larger(min(r_D,r_C)>0.3),the power of CCA-U statistic is high even when sample size is small.③CCA-U statistic is not sensible to different genetic interaction model assumptions,and is able to further applying to them.(4) Compared with traditional methods,the results of CCA-U statistic are more powerful.For gene-gene interaction between two genes:①When LD-based statistic or logisitic regression model could not detect any SNP-SNP interaction,but CCA-U statistic could detect interation between the two whole genes.②When LD-based statistic or logisitic regression model could detect out SNP-SNP interaction,CCA-U statistic could also detect interation between the two whole genes.③Only in extreme cases,especially when sample size is small,the power of CCA-U statistic could be very low. |