Font Size: a A A

Modeling On The Graphical Model In Microbiome Data Based On Bayesian Neighborhood Selection

Posted on:2022-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:W Q YangFull Text:PDF
GTID:2530306326473984Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Microorganisms have a profound influence on the formation and transmission of many diseases,and we hope to characterize the data of the microbiome containing different bacterial groups,infer their structural relationships,and study the way they act on diseases.Researchers have now been able to use high-throughput sequencing technology,such as 16S rRNA gene sequencing,to quantitatively study the functional effect of microbiota on the hosts,the output of which is the count of microbiome taxa in each sample.However,due to the composition,sparsity,heterogeneity and noise characteristics of microbiome samples,they pose a severe challenge to statistical modeling.The Dirichlet-Multinomial(DM)model has a strong hidden independence structure,which can effectively represent the association between bacterial group composition and its covariables.In this paper,the DM model was used to simulate the bacterial count samples,and the observed samples were divided into binary matrices by introducing hidden variables,which reflected the microbiome’s performance in the host and its correlation.Under the Bayesian framework,this paper proposes an improved Neighborhood Selection method,which can model graph models of discrete hidden variables.The graph model can intuitively and clearly reflect the correlation between variables.If there is a correlation between variables,there will be an undirected edge connection between nodes in the graph model.In this paper,the posteriori distribution of each parameter in the model is derived,and a posterior estimate of the graph structure is obtained based on MCMC sampling.The structures obtained can truly reflect the interaction between groups of microorganisms.The method in this paper will not arbitrarily choose a fixed threshold,but take the concentration parameter as a random variable and estimate it based on MCMC sampling.Through simulation results,the method is proved to be able to accurately infer the graph model structure of microbiome data.Finally,We applied our method to a bacterial vaginosis(BV)dataset to capture the structure of the microbiome associated with the disease and the structural changes in bacterial populations during the occurrence of the disease.These results are consistent with existing biological studies.
Keywords/Search Tags:Dirichlet-Multinomial distribution, Neighborhood Selection, Logistic Regression, Spike-and-Slab prior
PDF Full Text Request
Related items