Font Size: a A A

Causal inference with complex sampling designs

Posted on:2013-07-14Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:He, ZhulinFull Text:PDF
GTID:1450390008967470Subject:Biostatistics
Abstract/Summary:
This PhD dissertation concerns causal inference with data from complex sampling designs. It consist of several aspects regarding newly developed or implemented methodologies with complex survey data, which are described as follows.;First, we consider the problem of adjusting for confounding by cluster in the context of complex multistage sampling and a binary outcome. We investigate three categories of approaches -- ordinary logistic regression for survey data, with either no effect or a fixed effect for each cluster; conditional logistic regression extended for survey data; and generalized linear mixed model (GLMM) regression for survey data. We use theory, simulation, and analyses of the 2005 National Health Interview Survey (NHIS) data to compare and contrast all of these methods. One conclusion is that all of the methods perform poorly when the sampling bias is strong, which motivates us to find another method works properly with strong biased sampling designs.;We then show that for logistic regression with a simple match-pairs design, infinitely replicating observations and maximizing the conditional likelihood results in an estimator identical to the unconditional maximum likelihood estimator (MLE) with a fixed effect for each pair based on the original sample. Therefore, applying conditional likelihood methods to a pseudosample with observations replicated a large number of times can lead to a biased estimator. This casts doubt on one alternative approach to conditional logistic regression with complex survey data.;In the third chapter, we generalize binary conditional logistic regression for complex survey data by implementing the method based on a weighted pseudo-likelihood, in which the contribution from each neighborhood involves all pairs of cases and controls in the neighborhood. We show that it corresponds to an equivalent ordinary weighted log-likelihood formulation with binary outcomes. We explain how to program the method using standard software for ordinary logistic regression with complex survey data. We then apply the method to 2009 National Health Interview Survey (NHIS) public use data, to estimate the effect of education on health insurance coverage, adjusting for confounding by neighborhood.;Last, we concentrate on adjusting for unmeasured confounding of the effect of cluster-level adherence on an individual binary outcome with complex sampling designs. Seeking new methodologies for adjusting for confounding due to cluster effects, we use double inverse-probability weighting to adjust for the disproportionate sampling and the association of individual-level confounders with randomization. Then we develop and apply methods based on structural nested models to estimate effects of adherence assessed in terms of relative risk and risk difference, using cluster-level randomization as an instrumental variable and using the double weights to adjust for complex sampling and individual-level confounding. As an important application, we wish to estimate the effect of school-level adherence on individual absenteeism in the context of a school-based water, sanitation, and hygiene intervention (WASH) in Western Kenya.
Keywords/Search Tags:Complex, Data, Logistic regression, Adjusting for confounding
Related items