Font Size: a A A

Software Implementation Of Meta-regression For Genome-wide Association Study

Posted on:2021-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhengFull Text:PDF
GTID:2480306050973079Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Genome-wide association study(GWAS)is an important method in genetic research aiming at identifying disease-related variants within the human genome.In recent years,extensive progress has been made in the study of complex diseases,it has become one of the main strategies in the study of human complex diseases.Meta-analysis is one of the most important analytical tools in GWAS.By collecting and synthesizing GWAS results of multiple studies,the data can be integrated in a secondary analysis to achieve a much larger effective sample size and increase the probability of discovering new associations,and to solve the problem that the sample size of a single study is usually too small.Meta-regression(MR)is a meta-analysis approach to test gene-environment interactions,which is also the first time that the meta-regression technique has been applied to the gene-environment interaction analysis in GWAS.The method is usually consist of two steps.step 1: dividing subjects in each study into groups according to the distribution of the environmental variable,and estimating main effect as well its associated variance of each SNP on complex diseases or traits in individual studies.step 2: meta-regression analysis of the results to calculate the regression coefficients and covariance matrix of SNP and SNP-environmental interaction,and performing statistical test of SNP-environment interaction.This method has been shown to have higher statistical power in the presence of interaction than meta-analysis of SNP main effects only,comparable to the joint meta-analysis(JMA)approach when with linear interaction,and more robust in the presence of confounding factors.At present,the software based on JMA algorithm has been developed and applied in the GWAS of SNP-environment interaction.However,MR method has not been implemented as software,which hinders its application in genetic research.This thesis completes the software implementation of the MR method based on C++language under the Linux operating system,and the basic functions of the software are as follows: 1.reading the analysis result file and the SNP quantity index file of each study according to the user's needs,2.screening the SNP of each result file according to the quality control indices such as missing rate,Hardy Weinberg equilibrium,minor allele frequency,minor allele count,etc.3.carrying out MR analyses of SNPs including tests of interaction test,SNP mian effect,as well joint effects of the SNP and the interaction effects.Final result file contains SNP basic information,analysis results,sample size information,etc.At the same time,the implementation is highly efficient in terms of computation and memory usage.Finally,this thesis also carries out extensive functional and performance tests on the developed MR software.The functional test mainly uses test data and the erroneous data,uses different options and the parameters checks intermediate results and the final results of basic modules,verifies the reliability,the robustness and the scalability of the software.Performance tests uses results of 12 groups from three GWAS studies,with about 30 million SNPs to test all the functions of the software.Furthermore,results of the analysis are compared with those by the general statistical computing software SAS,the tests show that our implementation is a high performance software with high efficiency and precision.
Keywords/Search Tags:GWAS, gene-environment interaction, meta-regression method, SNP, C++, software design and implementation
PDF Full Text Request
Related items