Font Size: a A A

Research On Discovery Of Causalities Among The Gene Mutations

Posted on:2020-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ZhenFull Text:PDF
GTID:2370330596995057Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The maturity and popularity of high-throughput sequencing technology make gene sequencing data produce in large quantities.How to extract sufficient information from these data is a hot and difficult point in bioinformatics.In bioinformatics,the interaction between gene mutations has always been a difficulty in genome-wide association analysis,and the relevant research results will also bring impact on human society.It is an indispensable link to study the causal relationship among gene mutations for rational application of relevant research results and prevention of relevant problems.Most studies on the interaction among gene mutations are conducted at the correlation level,which is likely to cause false positive.This research starts from the perspective of causality.Based on WTCCC(Wellcome Trust Case Control Consortium)data,this paper designs causality discovery algorithms for the two types of granularity data(gene and SNP,the abbreviation of Single Nucleotide Polymorphism)respectively,proposes causality discovery algorithms among gene mutations and asymmetric causal association rules discovery algorithms,and studies causality discovery among gene mutations.We describe the causal relationship among gene mutations by constructing causal network,and detect the directed structure relationship implied in data of two types of granularity.The methods find the directed structure among data,and its asymmetry reveals the causal relationship between gene mutations.In causality discovery algorithm,according to the fact that a gene contains multiple SNPs,the SNPs data getting from WTCCC research institute are recoded,and the SNPs data belonging to the same gene are recoded into discrete gene granularity data.In causality discovery,we select a target gene,traverse other genes,select candidate genes that interact directly with the target gene through conditional independence test,and delete redundant non-direct interaction genes through the backward phase of the algorithm,so as to reduce the genes introduced by the error of the first kind,and improve the reliability of causality test results.Finally,based on the conditional independence of the converging structure,the structure of candidate gene and target gene was identified,and then the causal relationship was deduced.The results of the experiment have detected a number of gene mutations with strong causality,which can provide reference for biochemical research.An asymmetric causal association rule discovery algorithm is proposed for SNPs granularity data,which combines association rule mining with V-Structure to detect causal structure implied in data.Firstly,based on information theory,V-Structure Measure(VSM)is proposed to measure the matching degree of structural relationships and converging structures among SNPs variables.VSM is extended to general many-to-many situations.Then,an ASymmetric Causal Association Rule Discovery(ASCARD)algorithm is proposed for symmetric cases.The ASCARD method and the existing method are used to test the simulation data respectively,which proves the effectiveness of the ASCARD method.Finally,the ASCARD method is used to detect the real SNPs data set of WTCCC.The results of the experiment detected the causality between SNPs in different genes.Combining with the related research,this paper also explains some of the experimental results biologically and provides a research proposal for reference.
Keywords/Search Tags:Causality, Gene Mutation, Single Nucleotide Polymorphism, Conditional Independence Test, Asymmetric Causal Association Rule Discovery
PDF Full Text Request
Related items