Font Size: a A A

Variable selection methodology for high -dimensional multivariate binary data with application to microbial community DNA fingerprint analysis

Posted on:2003-12-03Degree:Ph.DType:Dissertation
University:Purdue UniversityCandidate:Wilbur, Jayson DwightFull Text:PDF
GTID:1461390011981463Subject:Statistics
Abstract/Summary:
In order to understand the role of microorganisms in an environment, the identification and characterization of the relevant microbial community is necessary. Characteristic profiles of microbial communities are obtained by denaturing gradient gel electrophoresis (DGGE) of polymerase chain reaction (PCR) amplified 16S rDNA from soil extracted DNA. These characteristic profiles, commonly called community DNA fingerprints, can be represented in the form of high-dimensional binary vectors. The problem of modeling and variable selection for high-dimensional multivariate binary data is addressed from both a frequentist and a Bayesian perspective. Permutation-based approaches are employed to select variables which vary significantly with respect to a treatment effect and the properties of these methods are explored via simulation. An Empirical Bayes model for multivariate binary response data is proposed and variables are selected by making posterior inference on the model space. In conclusion, an application of the methodology is explored in the context of a controlled agricultural experiment.
Keywords/Search Tags:Multivariate binary, Microbial, Community, DNA, Data
Related items