Font Size: a A A

A Research Of Random Forests Based Method For Causal SNPs Detection And Epistasis Interaction

Posted on:2013-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:J J YaoFull Text:PDF
GTID:2230330395485273Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As the development of high-throughput sequence technique and theaccomplishment of the HapMap project, it’s realizable to carry out genome-wideassociation study (GWAS), and the causal gene detection for complex disease is animportant research. The gene-gene interaction effect also play an important role in thepathogenesis of the complex disease, thus, the epistasis effect detection will becomean important research direction.Random Forests is a new data-mining method, and was gradually applied invarious fields. Random Forests not only can classification, but also can obtainvariable importance. This thesis used Random Forests to analysis the genome-wideSNP dataset and computed the variable importance of every SNP. The larger variableimportance score, the stronger association with the disease. We ran Random Forestsfor simulated dataset of rheumatoid arthritis and real dataset of aged-related maculardegeneration, got the most important SNPs, the result shown the validity of themethod for causal SNPs detection.Then, for the problem that gene-gene interaction have huge number to test andthe reliability of the number of trees and attribute stochastic selected when grown theforest, the thesis proposed a new method of Random Forests based method forfiltering a much smaller subset of SNPs for next stage of single locus or epistasisanalysis. The result of implementation to the aged-related macular degenerationdataset shown the validity of the method for causal SNPs and SNP-SNP interactiondetection, and the result may has worthiness to next stage biology experiment toreference.
Keywords/Search Tags:Random Forests, SNPs, GWAS, complex disease
PDF Full Text Request
Related items