Font Size: a A A

Detecting Rater Drift On An Oral English Performance Test With A Multi-faceted Rasch Model

Posted on:2014-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:B W DengFull Text:PDF
GTID:2255330422455953Subject:Foreign language teaching techniques and evaluation
Abstract/Summary:PDF Full Text Request
Performance test of language proficiency measures examinees’ abilities torespond to real life language tasks. The scores from performance testing, therefore,bring more accurate and valid information about the examinees’ ability of languageuse. However, one necessary limitation of such performance testing of language isthat examinees’ performances are rated by human raters. Human raters have beenknown to introduce error into the rating process. Such rater effects have an importantimpact upon examinees’ scores or even their future in high-stake tests. Therefore, it isnecessary to control the effect of human rating effects on the validity of the testscores.This study investigates the stability of rater severity over time and other majorrater effects on an Oral Test of NMET (National Matriculation English Test) in aprovince of China. Ratings from360examinees rated by15raters were pooledtogether and analyzed by FACETS3.58.0, a multi-faceted Rasch analysis program(Linacre,2005). The study found that the raters differed in their severity levels andtheir severity levels did not stay invariant over time. However, despite of thedifferences in rater severity, their changes of severity across sessions were acceptable.Rater9and Rater10changed more than expected. The majority raters were consistentin their application of the rating scale but rater10and rater15did not use the ratingscale as consistently as other raters because they showed more variation than wasacceptable from the expectation of the Rasch model. Raters on the whole were foundto have central tendency effect on the traits of fluency and language. In addition,Rater3,5,10and13exhibited possible halo effect. However, on the whole, haloeffect did not exist. Six raters out of fifteen were identified with outfit values largerthan their infit statistics, showing that these raters had assigned some ratingsunexpected by the model. Test administrators should proceed to check the unexpectedratings identified by the Facets program to retrain or remove the unqualified raters or to revise the rating scale. The study shows that FACETS is a useful tool for studyingrater performance. The results produced by FACETS program can be used by testadministrations to target individual raters to help improve rater accuracy. In this way,rater effects on examinees’ scores can be reduced to the minimal level.
Keywords/Search Tags:rater severity, drift, halo effect, central tendency, Multi-faceted Raschmodel
PDF Full Text Request
Related items