Detecting Rater Drift On An Oral English Performance Test With A Multi-faceted Rasch Model

Posted on:2014-10-17

Degree:Master

Type:Thesis

Country:China

Candidate:B W Deng

Full Text:PDF

GTID:2255330422455953

Subject:Foreign language teaching techniques and evaluation

Abstract/Summary:

PDF Full Text Request

Performance test of language proficiency measures examinees’ abilities torespond to real life language tasks. The scores from performance testing, therefore,bring more accurate and valid information about the examinees’ ability of languageuse. However, one necessary limitation of such performance testing of language isthat examinees’ performances are rated by human raters. Human raters have beenknown to introduce error into the rating process. Such rater effects have an importantimpact upon examinees’ scores or even their future in high-stake tests. Therefore, it isnecessary to control the effect of human rating effects on the validity of the testscores.This study investigates the stability of rater severity over time and other majorrater effects on an Oral Test of NMET (National Matriculation English Test) in aprovince of China. Ratings from360examinees rated by15raters were pooledtogether and analyzed by FACETS3.58.0, a multi-faceted Rasch analysis program(Linacre,2005). The study found that the raters differed in their severity levels andtheir severity levels did not stay invariant over time. However, despite of thedifferences in rater severity, their changes of severity across sessions were acceptable.Rater9and Rater10changed more than expected. The majority raters were consistentin their application of the rating scale but rater10and rater15did not use the ratingscale as consistently as other raters because they showed more variation than wasacceptable from the expectation of the Rasch model. Raters on the whole were foundto have central tendency effect on the traits of fluency and language. In addition,Rater3,5,10and13exhibited possible halo effect. However, on the whole, haloeffect did not exist. Six raters out of fifteen were identified with outfit values largerthan their infit statistics, showing that these raters had assigned some ratingsunexpected by the model. Test administrators should proceed to check the unexpectedratings identified by the Facets program to retrain or remove the unqualified raters or to revise the rating scale. The study shows that FACETS is a useful tool for studyingrater performance. The results produced by FACETS program can be used by testadministrations to target individual raters to help improve rater accuracy. In this way,rater effects on examinees’ scores can be reduced to the minimal level.

Keywords/Search Tags:

rater severity, drift, halo effect, central tendency, Multi-faceted Raschmodel

PDF Full Text Request

Related items

1	Using Multi-facet Rasch Model Analyzing Rater Effects In Writing Scoring
2	Rater Effects And Behavior:Comparing Raters' Ratings In Different Rating Conditions In A Writing Assessment
3	A Contrastive Study On Rater Performance Between Native And Nonnative English Raters In English Writing Assessment
4	Perceptual agreement between multi -rater feedback sources in the Federal Bureau of Investigation
5	Multi-faceted Gorky�The Comparative Study Of The Three Gorky Biographies
6	Multi-faceted Modern Sex Appeal
7	Multi-faceted Rasch analysis and native-English-speaker ratings of Japanese EFL essays
8	Rater Severity And Behavior Across Different Rating Modes In A Continuation Writing Task
9	A Rasch-based Study On Rater Effects In Writing Assessment
10	A many-facet Rasch analysis of rater effects on an Oral English Proficiency Test