Reliability Study On Large-Scale Online Writing Scoring

Posted on:2010-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Shi

Full Text:PDF

GTID:2155360275481740

Subject:Foreign Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

Rating inconsistency was the major factor that jeopardizes scoring reliability. This study was trying to study the reliability of large-scale online writing scoring through checking CEPT rating consistency-inter-rater consistency and intra-rater consistency.Data obtained from the September 2007 scoring session of CEPT writing test were analyzed using Many-facet Rasch measurement. 9 raters rated 540 essays selected randomly from CEPT writing test of Hunan University on four writing tasks and double-reading approach was adopted. A five-point holistic rating scale was employed which consisted of five separate domains: task, structure, punctuation and capitalization, vocabulary and format. FACETS, a Many-facet Rasch measurement (MFRM) computer program, were implemented to analyze the data. The Partial Credit Modelâ€”a model in which each rater has a separate understanding of the rating scale, was used to investigate raters'discrepancies in using the rating scale.This study firstly investigated differences among raters' severity levels and understanding of rating scale through primary analysis, secondly examined whether raters remained stable across the four writing tasks through bias analysis. We got the following results:(1) Raters showed significant differences in their severity levels.(2) Raters, in general, exhibited acceptable self-consistency in ratings, but low inter-rater consistency.(3) Raters did not exhibited significant bias toward tasks, and most raters rated consistently.(4) Some examinees'scores had differed by one grade or less from raw scores after Facets analysis.This study provided implications for enhancing rating consistency in large-scale online writing scoring. According to the model, those ambiguous or rough descriptions of rating scale could be amended; those raters with low consistency should be retrained or get replaced; finally, large-scale test was high-stakes, so before being reported students'raw scores could be adjusted according to the model measurement, which is exactly the innovation of this study.It is very necessary to control scoring errors in large-scale writing test, which means controlling of inter-rater and intra-rater consistency. Inconsistent raters could be checked out by fit statistics and bias values in Many-facet Rasch measurement, and those raters should be retrained or replaced. In this way, the reliability of large-scale writing scoring may get ensured and improved.

Keywords/Search Tags:

reliability, rating consistency, severity, Many-facet Rasch measurement, FACETS

PDF Full Text Request

Related items

1	Application Of Many Facet Rasch Model On English Writing Assessment
2	Using Multi-facet Rasch Model Analyzing Rater Effects In Writing Scoring
3	A comparison between two scoring methods, holistic vs analytic, using two measurement models, the Generalizability Theory and the Many-facet Rasch Measurement, within the context of performance assessment
4	An exploration of test taker, rater, and item facets of the writing section of TOEFL using many-facet Rasch measurement
5	A Contrastive Study On Rater Performance Between Native And Nonnative English Raters In English Writing Assessment
6	A many-facet Rasch analysis of rater effects on an Oral English Proficiency Test
7	A Many-facet Rasch Model Analysis Of Rater Effects In CET-SET
8	Study Of Sources Of Score Variability In Performance Testing Using Many-facet Rasch Model
9	A many-facet Rasch measurement analysis to explore rater effects and rater training in medical school admissions
10	The Application Of Grade Response Multilevel Facets Model Under Three Types Of Rating Designs