Font Size: a A A

To weight or to adjust: An empirical study of the design-based and model-based approaches

Posted on:2011-12-30Degree:Ph.DType:Dissertation
University:The University of North Carolina at Chapel HillCandidate:Cai, TianjiFull Text:PDF
GTID:1442390002956594Subject:Statistics
Abstract/Summary:
When a sampling design is correlated to the dependent variable, then the distribution of the sampled units is different from that obtained from a simple random sampling design. Then the sampling design is informative, in the sense that if the design variables were not included in the analysis model, even conditional on the covariates, the estimated model parameters can be biased.;Questions have been asked about how survey data are modeled when sampling designs are informative. Two fundamental methodologies, design-based and model-based, have been proposed to address this issue. A model-based method--so-called sample distribution method, has been proposed by Krieger and Pfeffermann (1992; 1997) to extract the model of the sample data as a function of the model holding in the population and the sampling design. Once the model holding in the sample data is derived, the standard model-based analysis techniques can be applied to estimate the unknown population parameters. The core topic of this dissertation is to assess various modeling strategies and estimators of regression coefficients and their variance---both design-based and model-based, in particular, the sample distribution method, under the informative sampling design, and to develop a modeling strategy for analysts who are facing this design-based or model-based dilemma.;The dissertation is comprised of three research papers that provide (1) an evaluation of the design-based and model-based estimators under a single-stage informative sampling design; (2) an assessment of design-based and model-based estimators under an informative two-stage clustering sampling design; (3) a joint treatment of informative sampling and unit dropouts in longitudinal studies.;When a single-stage sampling design is informative, the model-based naive method---either ordinary least square or maximum likelihood, produces biased results. The design-based method reduces the amount of biases for some parameters (e.g. intercept) but increases variances, which may lead to too conservative conclusions. The sample distribution method produces better estimates in the term of having smaller biases and variances than the naive and design-based methods.;Under an informative two-stage clustering sampling design, ignoring the sampling effect, the model-based naive method produces biased results. Under some specific assumptions, the sample distribution method produces better estimators in terms of smaller biases and higher coverage rates compared to the naive method and the design-based multilevel pseudo likelihood method. Although many previous studies have shown that multilevel pseudo likelihood method is preferred to compensate for the sampling design, this study shows that a rather simpler method---the sample distribution method can be used to address the design effect.;In a specific statistical setting, the relative performance of the design-based and the model-based methods for compensating the informative sampling design and dropout has been investigated. The simulation results indicate that both the model-based and the design-based approaches generally work well in the missing at random and missing not at random settings. Moreover, the sample distribution method combined with the Diggle and Kenward model has advantages of correcting the design effect and the nonignorable dropout.
Keywords/Search Tags:Sampling design, Model, Design-based, Distribution
Related items