Font Size: a A A

Bayesian Variable Selection With Informative Priors For Integrating Data Structural Information

Posted on:2016-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:W WenFull Text:PDF
GTID:2180330482953800Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Biomarker discoveries using all types of high-throughput omics data nowadays provide a great opportunity for effective diagnosis, treatment, and prevention of many complex diseases. Nonetheless, the challenge lies in how to find specific biomarkers from high dimensional omics data sets, which usually have relatively small sample sizes. This challenge is often called the problem of "Large P, Small N".As a solution, we propose a Bayesian variable selection (BVS) strategy for biomarker discovery within which informative prior distributions are used for making meaningful selection results. Simulations were carried out to assess the performance of our BVS model. We utilized factorial design analysis of variance to observe the influence of learning sample size, the positive rate, signal-to-noise ratio (SNP), intergenic correlation, and the effects of casual genes on our model. The results showed that the effects of SNP on our model was the greatest, followed by learning sample size, then the positive rate, intergenic correlation, and finally the effects of casual genes. Gaussian Graphical Model (GGM) and maximal information coefficient (MIC) were both used to analyze the genie interactions from the simulated data to construct three different kinds of informative prior distributions each, for a total of six structural information all together. The objective was to explore the influences of different prior information on our model, and to find the best incorporation of prior knowledge on genie interactions. As our results exhibited, both GGM-BVS and MIC-BVS performed well in predicting simulated causing genes, especially when the prior distribution was constructed by the partial correlation coefficient matrix in GGM-BVS and the ranking maximal information coefficient matrix in MIC-BVS. Therefore, we consider both GGM and MIC to be capable of mining informative prior knowledge on genie interactions, with the most favorable results from partial correlation coefficient matrix in GGM-BVS and the ranking maximal information coefficient matrix in MIC-BVS.We also chose the breast cancer gene to perform an empirical study, and utilized the GGM-BVS with partial correlation coefficient matrix and MIC-BVS with the ranking maximal information coefficient matrix to select the genes. We listed the top 15 genes selected by the two models, with some of them already demonstrated to have biological significance related to breast cancer in several literatures.
Keywords/Search Tags:Bayesian variables selection, gene selectione, Graphical Model, maximal information coefficient
PDF Full Text Request
Related items