Font Size: a A A

Construction Strategy And Statistical Methods For Network Regression Model With Continuous Outcome In Transcriptome-wide Association Studies

Posted on:2024-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Y JinFull Text:PDF
GTID:2530306923472254Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:Transcriptome-wide association study(TWAS)is a multi-omics analysis method that explores the associations between gene expression and complex diseases by integrating genome-wide association study(GWAS)and eQTL mapping study,which shows great promise in investigating how the associated loci identified by GWAS contributes on diseases.However,at present,almost all TWAS methods only focus on one gene at a time,and the only two published multi-gene TWAS methods fail to consider the complex network relationships among multiple genes.In the framework of network medicine,multiple genes affecting complex diseases do not act in single gene manner,but interact together to formulate specific biological pathways or networks that control the occurrence,development,and outcome of diseases.Therefore,TWAS analysis should transform the focus from "exploring single genes related with complex diseases" to "exploring gene networks related with complex diseases".Each nodes and edges in the network represent certain biological meanings,and the network effects include not only the node effects produced by each gene,but also the edge effects produced by two genes in the network.Ignoring edge effects in the network will inevitably lead to loss of information,then may lead to power loss in TWAS analysis.However,the correlation between genes within a biological network is far from a simple linear correlation.The key problem is to determine a reasonable statistical metric to describe the edge effects of network,and then develop a TWAS network regression statistical method,which can be able to simultaneously integrate network node effects and edge effects.This study focused on continuous variable outcomes and developed a pointwise mutual information-based network regression model in two-stage TWAS framework(NeRiT),to explore the association between a specific biological network and traits of interest.Methods:In this study,the model-building strategy relied on the traditional two-stage TWAS framework.Firstly,for each gene in a specific biological network,in the first stage,based on the genotype data and gene expression data in the eQTL study,a non-parametric Bayesian Dirichlet regression procedure(DPR)was used to obtain the effect size of SNPs on gene expression.In the second stage,based on the genotype data in GWAS and the effect size estimates obtained from the first stage,the predicted gene expression was calculated in GWAS to obtain the node effects in the given biological network.Meanwhile,pointwise mutual information(PMI)was introduced to characterize the correlation between the predicted expression of each gene as edge effects in the biological network.In order to avoid estimation bias due to the uncertainty of the effect size prior distribution,this study used two-dimensional Gaussian kernel density estimation to improve the accuracy of the estimation of PMI between network nodes.Finally,node effects and edge effects in the network were included in the model simultaneously,to construct the NeRiT model to explore the association between the given biological networks and traits.Given the influence of gene expression prediction accuracy on TWAS analysis,we also examined the performance of NeRiT when using Bayesian sparse linear mixed model(BSLMM)as the gene expression prediction model.The data used in simulations and real data applications were from publicly available databases(eQTL study from GEUVADIS study,GWAS data from UK Biobank),and the biological network structure was from Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway database.Given that there were no current statistical methods for TWAS network regression,based on the NeRiT framework,this study introduced product moment to characterize the correlation between predicted gene expression,and constructed a Product Moment-based Network regression model in TWAS(PMNT),so as to evaluate the advantages of PMI in capturing different types of correlations among network nodes and the comprehensive performance of NeRiT model.1.In simulations,a small network(renin secretion network)containing 13 nodes and 8 edges and a large network(lipid and atherosclerosis network)containing 82 nodes and 87 edges were obtained from KEGG pathway database.According to the structure of effecting nodes and/or effecting edges in the network,four simulation scenarios were designed:(1)only nodes in the network had effects on traits;(2)only edges in the network had effects on traits;(3)both nodes and edges in the network had effects on traits,and the effecting nodes were on the effecting edges;(4)both nodes and edges in the network had effects on traits,but the effecting nodes were not on the effecting edges.Four node correlation patterns,including linear correlation and three nonlinear correlation(quadratic relationship,sinusoidal function relationship,the combination of quadratic and sinusoidal function relationship)were considered in each simulation scenario.The effecting nodes and/or effecting edges in the network were selected by fixed and random methods,respectively,to explore the type I error and statistical power of the NeRiT model under different simulations.2.In real data applications,systolic blood pressure and diastolic blood pressure were obtained from the UK Biobank,and the key network structures regulating blood pressure(renin secretion network and aldosterone regulatory reabsorption network)were included from the KEGG pathway database to explore the advantages of the NeRiT model in identifying the effecting nodes and/or effecting edges in the network.Results:1.Simulation results showed that the type I errors of NeRiT model could be effectively controlled under any simulation conditions.In terms of detecting effecting nodes,both NeRiT model and PMNT model had comparable power performance.When the correlation pattern between nodes was linear,the product moment is the gold standard to capture the linear correlation between variables,Thus the power of PMNT model was slightly higher than that of NeRiT model.However,when the correlation pattern between nodes was nonlinear,the power of NeRiT model was much higher than that of PMNT model in most simulation scenarios.2.Real data applications showed that,NeRiT model and PMNT model identified the same effecting gene nodes related to traits in the two biological networks,but NeRiT model identified more effecting edges than the PMNT model.Conclusions:1.NeRiT model can effectively extract the complex nonlinear correlation information between biological network nodes.Compared with PMNT model,NeRiT model showed higher power in testing the effecting edges of network.Its advantages are robust to a variety of different network structures,network sizes,selection methods of effecting nodes and effecting edges,and sample size.2.The analysis of real TWAS data showed that NeRiT model had good practicability in exploring the "network markers" of complex diseases,which was suitable for the mining and analysis of biomedical data.The results can provide statistical support for finding potential pathogenic targets and subsequent experimental verification.In conclusion,NeRiT is an efficient TWAS statistical analysis method with good performance.
Keywords/Search Tags:TWAS, Multi-omics integration, Pointwise mutual information, Model building
PDF Full Text Request
Related items