Font Size: a A A

Logistic regression with incomplete covariate data in complex survey sampling

Posted on:2001-04-08Degree:Ph.DType:Dissertation
University:University of South CarolinaCandidate:Moore, Charity GalenaFull Text:PDF
GTID:1460390014960124Subject:Biology
Abstract/Summary:
The purpose of this study was to investigate methods of handling missing covariate data in logistic regression in complex survey sampling. Many epidemiological studies use complex survey sampling to maximize information about a population via a sample while minimizing costs. Parameter estimation from the samples takes into account the original size of the population, characteristics of that population, and the size of the sample strata.;Missing data occurs frequently in the research world, particularly item non-response. Item non-response takes place when a sampled individual does not have complete information on all items. Two types of missingness are noted in the literature---ignorable and non-ignorable. Within the ignorable category, the missing data mechanism can be missing completely at random (MCAR) or missing at random (MAR). MCAR implies no dependence between missingness and any variables of interest. MAR occurs when the probability of having incomplete data for a variable depends on the observed information. Non-ignorable (NI) missingness implies the probability of having missing information depends on the true value of the missing information. This study focuses on MAR.;Three methods were compared to complete case (CC) for performance in estimating parameters in logistic regression with outcome, Y, and two covariates X and Z, the latter covariate being incomplete. The three methods were multiple imputation (MI), re-weighted estimating equations (RWEE), and the Expectation-Maximization (EM) algorithm. Simulations investigated performance of the four methods by comparing bias, coverage probabilities and calculated versus empirical variances.;Multiple imputation (MI) performed better than CC analysis when estimating the coefficient for X. When estimating the coefficient for Z, the MI coverage probabilities were low due to lower calculated variances for the MI method. The RWEE method showed good results compared to CC analysis when the missing data mechanism is correctly specified. The RWEE method shows more bias than the other two methods but has overall good coverage probabilities. The EM algorithm clearly performs better than the other methods when looking at bias and coverage probabilities. The method is stable across varying levels of association between the variables X and Z and the variables Y and Z conditioning on X.
Keywords/Search Tags:Complex survey, Logistic regression, Data, Covariate, Missing, Methods, Coverage probabilities, Incomplete
Related items