Font Size: a A A

Application Research Of Genetic Interaction Network Method Based On Path Analysis

Posted on:2020-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:L C PiFull Text:PDF
GTID:2370330590997674Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
BackgroundIn the post-Gomeome-Wide Association Study(GWAS)era,revealing the effects of rare/low frequency variation and the widespread inter-genetic nonlinear interaction is an important way to solve the problem of “missing heritability”,but using high-dimensional low-frequency variation Data mining genetic interactions remain a major challenge in terms of statistical power,dimensional distress,and biological interpretation.The method of selecting variables based on pathway analysis is an important way to solve dimensional distress,improve efficiency,and obtain better biological interpretation.In terms of statistical methods,the random forest method is a commonly used method for screening related sites,and the nonlinear interaction between sites can be found to some extent.Some scholars have proposed the BGTA(Backward Genotype-Trait Association)algorithm in the GWAS study screening site,using GTD(genotype-trait distortion)score to detect the interaction between the no/weak main effect sites.Based on the strategy of pathway analysis,this study explores the genetic interactions in genome-wide association studies from the SNP(Single Nucleotide Polymorphism)level and gene level using a two-stage BGTA algorithm to construct a visualized genetic interaction network and a random forest approach.The results are compared.To provide an effective statistical analysis strategy for identifying the interaction of no-effects,it provides important clues for further exploring the biological mechanisms of diseaserelated pathways.MethodsThe data of this study were derived from the exon sequencing data of the unrelated population in the Genetic Analysis Workshop(GAW19)and the real hypertensive phenotype data,with or without hypertension as the outcome variable.The KEGG database was used to search the genetic information of the renin-angiotensin-aldosterone system(RASS)-related pathway,and the initial genetic data set was obtained by matching with the exon sequencing data in the GAW19 database.According to the inclusion criteria,the minimum allele frequency(MAF)is greater than 0.01,linkage disequilibrium(LD,r2<0.8),and Hardy Weinberg equilibrium test(HWE,P ? 0.05),the candidate genetic variation data set is obtained.For the convenience of explanation and result presentation,all the sites are numbered uniformly by 1,2,....N.Gene interactions were analyzed using a two-stage BGTA algorithm.In the first stage,the BGTA algorithm with random subset k=10 is selected to screen out the top 100 return subset of the GTD score.In the second stage,the screened sites were analyzed by the BGTA algorithm with k=2,and the statistical test and FDR(false discovery rate)correction were performed by the substitution test.A genetic interaction network for constructing SNPs and a genetic interaction network mapped to genes using statistically significant SNPs.Finally,logistic regression was used to verify the main effects,multiplication interactions and additive interactions of the loci.The initial data set was screened by the importance score of the random forest and the estimation error outside the bag,and then the interaction of the selected sites was analyzed by the decision tree,and further logistic regression was applied and compared with the two-stage BGTA algorithm..In the gene unit,the integrated GTG scores were constructed using the GTD scores of the first phase of the BGTA algorithm to obtain the maximum mean marginal effect(M value)and the gene interaction score(both R value and Q value).Zero-distribution of gene-pair interaction information was constructed by means of permutation data,and statistical analysis was performed by curve method and rank method to determine the genetic interactions finally included.Finally,a network of interactions between SNPs and gene levels was constructed.In the case study,the genetic variation data on the LncRNA HOTAIR regulatory pathway in abdominal obesity related to primary and secondary school students in Guangzhou were analyzed.According to the waist height ratio(Whtr)>0.5,the abdominal obesity outcome was defined,and a total of 4007 samples were obtained.The gene-gene,gene-environment interactions were analyzed using BGTA algorithm and interaction scores,respectively,and verified by logistic regression.ResultsThe KEGG database was used to retrieve the genes of the three related pathways of RASS.After matching with the GAW19 data,248 loci containing 53 genes were obtained according to the inclusion conditions,among which 110 were low frequency mutations.The first phase of BGTA screened 76 loci that may be associated with hypertensive phenotype,including 61 low-frequency variants;the second-stage BGTA algorithm screening interaction showed that 1102 pairs of SNP interactions were included,of which 82 pairs The FDR control of SNP interaction was within 10%(P<0.007).The 82 pairs of SNPs included 56 loci,of which 44 were low frequency variants.Interacting sites 49(PIK3R3),26(ATP1A4),52(REN),247(THOP1),184(ANPEP)and other sites have a large number of edges,which can be regarded as key pivot sites.Logistic regression results showed that there were 16 pairs of SNP interactions without main effects,12 pairs of multiplicative interactions and 10 pairs of additive interactions,and these interactions were mostly interactions between loci 26,49,48 and other loci.The random forest method screened 35(including 0 low frequency variants)and 61(including)based on the importance scores reflected by the Mean Decrease Gini(MDG)and Mean Decrease Accuracy(MDA),respectively.30 low frequency variants).Furthermore,the interaction tree model was established by the decision tree for the selected sites,and the logistic regression results showed that there were no main effect of the multiplicative interaction of the four pairs of SNPs and the additive interaction of the three pairs of SNPs(P<0.0001).Using the two-stage BGTA algorithm to analyze the interaction results by gene unit,33 pairs of gene interactions were included in the comparison method,and 9 pairs of gene interactions were statistically significant(P<0.01).A total of 17 pairs of gene interactions were included in the quantile ratio method,and 4 pairs of gene interactions were statistically significant(P<0.01).The gene interaction network map shows that the gene PIK3R3 has extensive gene interaction with genes on the synthesis and secretion pathways of aldosterone.The results of the case study showed that the BGTA algorithm did not find the effect of gene-gene,gene-environment interaction on abdominal obesity.The interaction scores showed that the loci rs11202592 and rs762624 were statistically significant at the test level of 0.1(P=0.083).Logistic regression analysis showed that the two-point joint effect was associated with abdominal obesity outcome(P=0.0387).Conclusions 1.In the data of this study,the two-stage BGTA algorithm compared with the random forest method,in the correlation mutation screening stage(ie the first stage),the BGTA algorithm can screen out more potential interaction sites,among which Both the low frequency variation of the screening and the variation of the encoded protein information accounted for a higher proportion.In the second-order interaction recognition phase (ie,the second phase),the BGTA algorithm can find more low-frequency mutated interactions,no-effect interactions,multiplications,and additive interactions,and construct visualized genetic interactions through GTD scores of pairwise interactions.The network can identify key pivot sites and is easier to interpret biologically.2.The construction of gene interaction network based on BGTA algorithm can improve the interpretability of gene interaction,but the ability to analyze interaction is insufficient and needs further study.3.Variable candidate strategies based on pathway analysis can improve the ability and biological interpretability of identifying important genetic variation interactions in complex diseases.
Keywords/Search Tags:RASS pathway, BGTA, genetic interaction network, abdominal obesity, LncRNA HOTAIR
PDF Full Text Request
Related items