Font Size: a A A

A Study On The Application Of High-dimensional Statistical Inference Model To The Detection Of Linear Effects On Gene Sets

Posted on:2024-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2530307157955299Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective: Genome-wide association analysis(GWAS)has now identified a large number of complex disease-associated genetic variants,but these variants explain only a small fraction of the genetic power.Gene set-based association analysis has become a powerful tool for analyzing the etiology of complex diseases in GWAS studies.However,existing gene-set-based analysis methods are not self-adaptive to two different genetic effect hypotheses(locus micro-effect cumulative theory and major locus determinism),which would result in a significant loss of test power in case of misidentified genetic effects;also,existing models usually do not take into account the high-dimensional nature of SNP variants,which would also result in reduced test power or high false positives.Therefore,we develop a high-dimensional statistical inference framework oriented to the level of gene sets,which enables a significant improvement in the detection of genetic effects and provides a new statistical approach to study the genetic mechanisms of complex diseases.Methods:1.Stage 1: Variable screening using high-dimensional ordinary least squ ares Projection(HOLP)method to reduce the dimensionality of GWAS data fr om ultra-high dimensional to high dimensional.2.Stage 2: High-dimensional statistical inference technique based on Rid ge projection is applied to obtain the P-value of each SNP after the first stage o f variable screening.3.Stage 3: SNPs were mapped to the gene level,and P values were foun d for each gene by applying the "minimum P value method"(Min P)for major locus determinism and the truncated product method(TPM)for locus micro-ef fective cumulants.4.Stage 4: The P-values of Min P and TPM were integrated with the help of Omnibus test to adapt the two genetic effect hypotheses.Results:1.This study analyzes Alzheimer’s Disease Neuroimaging Initiative(ADNI)data to filter 300,000 SNPs down to 30,000 and achieve rapid dimensionality reduction.2.Ridge projection in high-dimensional statistical inference was used to obtain P-values and confidence intervals for 30,000 SNPs.3.The min P-value method was applied to obtain the significant gene CHCHD6.4.Applying the truncated product method,we obtained the significant genes QSOX1,FAM219 A,etc.5.Using Cauchy’s idea as a background,the omnibus test adapting two genetic hypotheses was developed to integrate results 3 and 4,resulting in nine significant genes,of which GUCY1A1 and GRIN3 A have been published in Nature,which have been shown to be associated with the development of Alzheimer’s disease.Conclusions: The adaptive omnibus test developed in this study screens for genes associated with Alzheimer’s disease in a study of the Alzheimer’s Disease Neuroimaging Initiative(ADNI)database,further demonstrating the higher power of the method.
Keywords/Search Tags:Complex diseases, Association analysis, High-dimensional statistical inference, Omnibus tests, P-value integration
PDF Full Text Request
Related items