Font Size: a A A

Random forests for multi-locus quantitative trait linkage analysis

Posted on:2008-12-25Degree:Ph.DType:Dissertation
University:University of Toronto (Canada)Candidate:Lee, Sophia Shu FangFull Text:PDF
GTID:1443390005472113Subject:Biology
Abstract/Summary:
The random forest is an ensemble of tree predictors with high predictive accuracy and a feature that measures the importance of covariates within a complex data structure. We developed a regression random forest for multi-locus quantitative trait linkage analysis that accounts for ambiguity in marker data by incorporating the posterior identical-by-descent (IBD) probabilities from EM Haseman-Elston (EMHE) regression as weights on each sib-pair in the tree predictors. We addressed drawbacks of the original definition of variable importance in linkage analysis and proposed two procedures, multi-marker partial permutation and smoothing, that each considers the correlation structure inherent in IBD linkage data, to improve the importance measures. We evaluated twelve variable importance indices, three of which were based on the definition of partial dependence, and compared their abilities to detect quantitative trait loci with the EMHE LOD score by simulation studies under several genetic models and conditions. The new random forest and hybrid variable importance indices showed promising results in identifying important markers influencing quantitative traits while addressing ambiguity in marker data and exploring complex interaction among all markers simultaneously. The proposed random forest and the original random forest were applied to the Framingham Heart Study genome scan data and a permutation test was applied for inference of the VI indices.
Keywords/Search Tags:Random forest, Quantitative trait, Linkage, Importance, Data
Related items