Font Size: a A A

Ranked set sampling for binary and ordered categorical variables with applications in health survey data

Posted on:2005-08-01Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Chen, HaiyingFull Text:PDF
GTID:1450390011952390Subject:Statistics
Abstract/Summary:
Ranked set sampling (RSS) is a sampling procedure that can be considerably more efficient than simple random sampling (SRS). It involves preliminary ranking of the variable of interest to aid in sample selection. Although ranking processes for continuous variables that are implemented through either subjective judgment or via the use of a concomitant variable have been studied extensively in the literature, the use of RSS in the case of a binary variable has not been investigated thoroughly. We use a National Health and Nutrition Examination Survey III (NHANES III) data set to investigate the application of balanced RSS to estimation of a population proportion. Logistic regression is used to aid in the ranking of the variable of interest. Our results indicate that this use of logistic regression improves the accuracy of the preliminary ranking in balanced RSS and leads to substantial gains in precision for estimation of a population proportion.;Further, we illustrate how data from one source can be used to construct the necessary logistic regression equation, which can, in turn, be used to estimate the relevant proportions in a second group of subjects for which the same predictor variables are available.;Balanced RSS, however, is not in general optimal in terms of variance reduction. We investigate the application of unbalanced RSS to estimation of a population proportion under perfect ranking, where the probabilities of success for the order statistics are functions of the underlying population proportion. In particular, Neyman allocation, which assigns sample units for each order statistic proportionally to its standard deviation, is shown to be optimal in the sense that it leads to minimum variance within the class of RSS estimators that are simple averages of the means of the order statistics.;In practice, a ranking procedure is most likely imperfect. When the rankings are not perfect, the probabilities of success for the judgment order statistics incorporate information on ranks as well as on ranking errors.;Finally, we extend the application of RSS, both balanced and unbalanced, to ordered categorical variables with the goal of estimating the probabilities of all categories. Results from a simulation study using the NHANES III data set indicate that the use of ordinal logistic regression in ranking leads to substantial gains in precision for estimation of population proportions. (Abstract shortened by UMI.).
Keywords/Search Tags:RSS, Sampling, Logistic regression, Population proportion, Ranking, Variables, Order, Data
Related items