Font Size: a A A

The effectiveness of the equipercentile method and IRT three-parameter model on vertical equating under varying conditions of sample size, test length, and anchor test length: A simulation study

Posted on:1993-09-21Degree:Ph.DType:Dissertation
University:Columbia UniversityCandidate:Ayerve, Rafael IgnacioFull Text:PDF
GTID:1472390014495673Subject:Education
Abstract/Summary:
In many testing situations, the use of equivalent test forms is all important. It is difficult, however, to construct two equivalent forms. It is necessary, then, to convert test scores from one test to those from another. This process, known as "test equating," can be performed at two levels: horizontal (test scores to be equated come from tests which are practically parallel) and vertical (test scores come from tests differing in level of difficulty, and examinee ability distributions are different). There exist two main categories of equating methods: methods based on Classical Test Theory (CTT), and methods based on Item Response Theory (IRT). Each category has its own assets and limitations.; This study compared the effectiveness of two of these methods, one within each category, on the vertical equating of tests. The Frequency Estimation Equipercentile method, based on CTT, and the three-parameter IRT logistic model were compared under varying conditions of sample size, test length, and anchor test length. Three independent investigations were carried out to: (a) compare the effectiveness among methods across all conditions; (b) compare the effectiveness among methods on each of the independent variables as manipulated in the study; and (c) examine the effect that each independent variable has on each of the methods under varying conditions.; Three examinee sample sizes (200, 500, 1000), two test lengths (30, 60 items), and two anchor test lengths (5, 10 items) were used in the study. To interpret the results, two summary statistics, Weighted Mean Square Error (WMSE) and Unweighted Mean Square Error (UMSE), were used.; The overall results, across all conditions, within the WMSE and UMSE values, showed no significant difference exists between the equipercentile and IRT methods, or between the IRT methods based on the two different estimation programs, ASCAL and BILOG. For the equipercentile method, test length in conjunction with anchor test length is an important factor, while for both IRT methods, ASCAL and BILOG, examinee sample size is the important factor. Small examinee samples (200) tend to produce inaccurate results, while larger examinee samples (500, 1000) produce more accurate results.
Keywords/Search Tags:Test, IRT, Sample, Varying conditions, Equipercentile method, Effectiveness, Equating, Examinee
Related items