Font Size: a A A

Comparing score trends on high-stakes and low-stakes tests using metric -free statistics and multidimensional item response models

Posted on:2006-06-04Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Ho, Andrew DeanFull Text:PDF
GTID:1455390008956457Subject:Educational tests & measurements
Abstract/Summary:
The most widely interpreted large-scale educational statistic is the test score trend. Positive trends are interpreted as an improvement in the education of students, as an increase in student learning, and as evidence of educational policies functioning as intended. An implicit assumption of this attention to test score trends is that they can be generalized to trends for other tests that measure the "same" desired learning outcomes. However, comparing trends across testing programs is not straightforward, nor are discrepancies readily interpretable when they are found.;The first half of this dissertation develops methodology for comparing trends across tests with different score scales. These chapters present and implement a "metric-free" framework that provides graphs and statistics that are independent of the test score scale. These methods allow comparisons of "high-stakes" state test score trends with trends for "low-stakes" tests such as the National Assessment of Educational Progress (NAEP). Results show that score trend discrepancies are widespread, and that average high-stakes test score trends are significantly more positive than their NAEP counterparts for the same state, subject, and grade combinations. These results cast doubt on common interpretations of high-stakes test score trends without offering any footholds for further interpretations.;The second half of this dissertation develops methodology to explain score trend discrepancies as a consequence of overlapping but not identical test content. In other words, where trend discrepancies arise, trends for overlapping content strands should be similar, while trends for nonoverlapping content areas should account for observed discrepancies. Multidimensional Item Response Models include ability or proficiency parameters for multiple dimensions or cognitive skills, allowing detailed descriptions of proficiency that may be glossed over by unidimensional models. These chapters develop a Markov Chain Monte Carlo-based estimation procedure for a confirmatory, 3-parameter logistic model. This model is used to estimate subscale trends for a high-stakes Reading test in a mid-sized state. Results suggest that the model estimation procedures are sound, but that the model cannot account for score trend discrepancies in this state. However, these methods are shown to have great potential for resolving the dissonance that trend discrepancies present.
Keywords/Search Tags:Trends, Score, Test, High-stakes, Model, Comparing, State
Related items