Font Size: a A A

Tree Model Based High-Order Interaction Effects Searching Heuristic Algorithm And Its Application

Posted on:2022-02-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y HeFull Text:PDF
GTID:1480306743990409Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
By feat of the rapid development of genomics technology,high-throughput sequencing technology,and bioinformatics analysis technology,researchers have been able to obtain massive amounts of genetic data for intensive study on diseases,and have yielded fruitful results.However,the understanding of complex diseases has not achieved revolutionary breakthroughs as expected.Taking the GWAS study as an example,the identified SNPs with main effects can only explain a small part of the genetic variation,called missing heritability.In practice,the occurrence and development of complex diseases are caused by the mutual influence of external environmental exposure factor and internal genetic factor.Ignorance of geneenvironment and gene-gene interaction is one of the important causes of missing heritability.The biological omics data often has thousands of features and limited samples.The application of traditional interaction analysis methods is very limited at the genome-wide level because of high algorithm complexity and multiple correction,even for the first-order interaction analysis.It's even more difficult for higher-order interaction analysis.Some advanced interaction analysis methods have sprung up in recent years,with meritsanddemerits.Based on the tree model,we propose a new highorder interaction analysis method(POINT,Path based high order interaction detector),and compare its performance with two other common methods among various simulation scenarios.Finally,POINT is used to explore genetic high-order interaction effect for lung cancer risk.The structure of the full text is as follows:In the first section,we introduce the principle and construction process of POINT method and discuss several key issues in the process of POINT construction.The simulation results show that the lower the threshold of FP tree,the deeper the tree model,the more the number of trees,and the stronger the POINT detection ability,while suggesting that the time consumed by the POINT and the number of interactions identified increase.Comprehensive considerations,POINT detection capability reached the highest point when FP tree threshold is set as 0.5,tree depth k+3,and the number of trees is set as 150,which are related to the number of variables in analysis dataset.In simulated dataset,the number of variables is 300.In the second section,we compare the statistical properties of the POINT,i RF and CINOEDV methods in different scenarios.The simulation results show that if there is a main effect but no interaction effect,the type one error was controlled better in POINT and CINOEDV,while inflated in i RF.In all scenarios,POINT has the best high order interaction recognition ability,followed by CINOEDV,and i RF has poor performance.In the third section,POINT was applied to study the high-order interaction effect for lung cancer risk.At the whole genomic level,60 interaction combinations with the highest order of 3 were found,including 10 additive combinations and 50 un-additive combinations.In the last section,we summarize the merits and shortcomings and discuss the prospects for future study.
Keywords/Search Tags:tree model, frequent itemset mining, high-dimensional data, high-order interaction, lung cancer
PDF Full Text Request
Related items