Font Size: a A A

The Mantel -Haenszel procedure for DIF: Alternative matching scores to control type I error and improve distributional properties

Posted on:2003-12-08Degree:Ph.DType:Dissertation
University:The University of IowaCandidate:Monahan, Patrick O'NealFull Text:PDF
GTID:1466390011489880Subject:Educational tests & measurements
Abstract/Summary:
The Mantel-Haenszel (MH) procedure is a nonparametric, contingency-table, method, commonly used in psychometrics for detecting differential item functioning (DIF). In the MH procedure for DIF, the conditional association between group (Reference and Focal) and dichotomous item score is estimated and tested, after controlling for overall performance of examinees. The control (matching) variable is usually the total number of items answered correctly (number-correct score). Therefore, the matching variable in the MH procedure for DIF: (a) consists of correlated categories, and (b) contains measurement error as a fallible surrogate for latent proficiency. If dichotomous item scores conform to item response theory (IRT) models more complex than the Rasch model, and number-correct score is the matching variable in the MH procedure for DIF, inflation of Type I error of the c2MH test and inflation of null-DIF bias of the MH odds ratio ( D&d4;MH is log transformation of odds ratio) may occur.;The primary purpose of this study was to investigate whether eight alternative matching scores control null-DIF empirical distributional properties of the MH procedure better than number-correct score. Investigations regarding MH, DIF, and Type I error have simulated data only with IRT models. In the present study, data were simulated with both the three-parameter logistic (3PL) IRT model, and a non-IRT technique involving the four-parameter beta compound binomial model for specifying the true score distribution.;Number-correct score displayed inflation in both types of simulated data. Four alternative matching scores consistently controlled the null-DIF bias of D&d4;MH and the mean and SD of the empirical c2MH distribution better than number-correct score: (a) categories of the estimated IRT proficiency parameter ( q&d4; ); (b) categories of the sum of weighted item scores, where the weights were either classical item-total biserial correlations or factor loadings from the first common factor of a factor analysis of tetrachoric correlations; and (c) Kelley's regressed true score estimates. However, the Kelley score performed worst of all matching scores with regard to empirical standard error of D&d4;MH .
Keywords/Search Tags:Score, Procedure for dif, Error, Item, Type, IRT
Related items