Whole-Gene-Statistical-Method Research For Population-Based Case-Control Study

Posted on:2010-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:Q Q Peng

Full Text:PDF

GTID:2144360278972709

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

Complex disease known as multi-factor disease is controlled by multiple genes, presenting familial assembling tendency but not coinciding with Mendel's law.Such as cardiovascular disease,non-insulin-dependent diabetes,essential hypertension,obesity and most of all malignant diseases are complex diseases.Along with completement of Human Genome Project,An increasing number of bio-medical researchers have shifted focus from single-gene disease research to complex diseases which have the greatest impact on human health and social development.The onset of complex disease is the result of complicated interaction network between genes and between gene and environment,while every single gene gives very little effect.Traditional statistical methods for complex disease gene-mapping study are mostly based on the level of linkage disequilibrium between tagging SNP and disease locus in population-based case-control study.Another way for studying pathogenesis of complex diseases is based on gene-gene interaction,but these methods are either detecting for SNP-SNP interaction or haplotype study.But single SNP is poor to represent the information of the complete disease gene(Unless the selected tagging SNP is the very disease locus, but this case is quite scarce.),so the two ways for complex disease study are not so convictively as they intermingle problems such as multiple correction,uncertainty of haplotype inference,lower power and so on.In order to solve these problems,we developed the definition of 'whole gene' in this article.The 'whole gene' is related to the biological interpretation of the gene as a complete functional unit,not a single tagging SNP of the gene.The 'whole gene' is not only the important tool for complex disease gene mapping study,but also the basis for constructing genetic interaction network.In this article,based on the definition of 'whole gene',we proposed the Principle Component Analysis-Bootstrap Confidence Interval Test(PCA-BCIT) for detecting association between 'whole gene' and disease susceptibility,and Canonical Correlation Analysis-based U Statistic(CCA-U statistic) for detecting interaction between whole genes.It contains two chapters in this article,and the main research results are as follows:Chapter 1 Principle Component Analysis-Bootstrap Confidence Interval Test (PCA-BCIT) for detecting association between 'whole gene' and disease susceptibility:In this article,we had developed four different principle component(PC) extracting strategies based on case-control study,these are:â‘ Seperately extracting PC equations from case group and control group(SES).â‘¡Extracting PC equations based on combined case-control group(CES).â‘¢Extracting PC equations based on case group(CAES),and applying the factor loadings to control group.â‘£Extracting PC equations based on control group(COES),and applying the factor loadings to case group.Based on principle component analysis,bootstrap technology and confidence interval test,we further developed the PCA-BCIT method for detecting association between 'whole gene' and disease susceptibility.The main results are:(1) Compared with PCA-LRT method and Armitage trend test,the result of PCA-BCIT was much more credible and more powerful.(2) For the four different PC extracting strategies:â‘ SES is a poor-power strategy,and inapplicable to real data analysis.â‘¡CES,CAES and COES are more powerful than SES extracting strategy.â‘¢COES extracting strategy is more consignable than CES and CAES as it is more coincide with the principles of case-control study.(3) PCA-BCIT is more applicable in practice compared with PCA-LRT and other traditional methods,and is deserve further applying in complex disease association study.Especially for researchers who were focus on analying association between 'whole gene' and disease susceptibility,and constructing genetic interaction network,this method is more suitable.Chapter 2 Canonical Correlation Analysis-based U Statistic(CCA-U statistic) for detecting interaction between whole genes:In this article,we developed the definition of 'whole gene',and demonstrated the inherent relation between biological interaction and statistical interaction.And based canonical correlation analysis,we developed CCA-U statistic for detecting interaction between whole genes.Through simulation study and real data analysis,we have concluded as following:(1) CCA-U statistic is approximately distributed as Normal distribution,and is a powerful statistical method for detecting interaction between whole genes.(2) The advantages of CCA-U statistic are:â‘ Compared with traditional Fisher' s methods,CCA-U statistic had a more close biological interpretation,as it is based on canonical correlation analysis and focused on detecting interaction between two whole genes,but not simply treated gene-gene interaction as the residual from main effects of the two genes in a additive model.â‘¡Compared with Multifactor Dimensionality Reduction method(MDR),logistic regression model,Linkage Disequilibrium-based statistics(LD-based statistics) and entropy-based statistics,CCA-U statistic is not detecting single SNP-SNP interaction, but detecting interaction between whole genes,so based on the statistic it is better for constructing genetic interaction network.â‘¢Compared with logistic regression model, LD-based statistics and entropy-based statistics,CCA-U statistic recurred to canonical correlation analysis for extracting systematic information of whole gene,effectively reduced freedom,avoided multiple-correction problem and improved detecting power.â‘£Compared with LD-based statistics,CCA-U statistic is based on whole gene which is a complete functional unit,so it is needless to consider confounding effect of linkage disequilibrium between SNPs.(3)Applying conditions of CCA-U statistic:â‘ Generally, when interaction measure between whole genes(r_D -r_C) is larger than 0.1,the detecting result of CCA-U statistic is credible with sample size less than 1000.â‘¡When canonical correlation coefficients between whole genes in case group and control group are both larger(min(r_D,r_C)ï¼ž0.3),the power of CCA-U statistic is high even when sample size is small.â‘¢CCA-U statistic is not sensible to different genetic interaction model assumptions,and is able to further applying to them.(4) Compared with traditional methods,the results of CCA-U statistic are more powerful.For gene-gene interaction between two genes:â‘ When LD-based statistic or logisitic regression model could not detect any SNP-SNP interaction,but CCA-U statistic could detect interation between the two whole genes.â‘¡When LD-based statistic or logisitic regression model could detect out SNP-SNP interaction,CCA-U statistic could also detect interation between the two whole genes.â‘¢Only in extreme cases,especially when sample size is small,the power of CCA-U statistic could be very low.

Keywords/Search Tags:

complex disease, association study, case-control, whole gene, interaction

PDF Full Text Request

Related items

1	Haplotype-based statistical inference for case-control genetic association studies with complex sampling
2	TGFBR2 Association Study Polymorphism And Congenital Heart Disease And Rheumatic Heart Disease
3	Association Study Of SNPs With Schizophrenia In Combined Family And Case-control Samples
4	Gene-gene interaction and gene-pathway analysis of genome-wide association for schizophrenia
5	A Case-control Study Of The Association Between MTHFR Gene Mutation And Neural Tube Defects
6	A Modified Ant Colony Optimization Algorithm For Identifying Gene-gene Interactions
7	Association Of Three-gene Interaction Among MTHFR, ALOX5AP And NOTCH3 With Thrombotic Stroke: A Multicenter Case-Control Study
8	The Study Of Genetic And Environmental Risk Factors And Gene-environment Interaction For SLE
9	Case-control Study Of The Risk Factors And Interaction To Pneumoconiosis Patients Complicated With COPD
10	An Epidemiological Study On The Interaction Between Genetic And Traditional Factors In Tuberculosis