Font Size: a A A

Reliability Study On Large-Scale Online Writing Scoring

Posted on:2010-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ShiFull Text:PDF
GTID:2155360275481740Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Rating inconsistency was the major factor that jeopardizes scoring reliability. This study was trying to study the reliability of large-scale online writing scoring through checking CEPT rating consistency-inter-rater consistency and intra-rater consistency.Data obtained from the September 2007 scoring session of CEPT writing test were analyzed using Many-facet Rasch measurement. 9 raters rated 540 essays selected randomly from CEPT writing test of Hunan University on four writing tasks and double-reading approach was adopted. A five-point holistic rating scale was employed which consisted of five separate domains: task, structure, punctuation and capitalization, vocabulary and format. FACETS, a Many-facet Rasch measurement (MFRM) computer program, were implemented to analyze the data. The Partial Credit Model—a model in which each rater has a separate understanding of the rating scale, was used to investigate raters'discrepancies in using the rating scale.This study firstly investigated differences among raters' severity levels and understanding of rating scale through primary analysis, secondly examined whether raters remained stable across the four writing tasks through bias analysis. We got the following results:(1) Raters showed significant differences in their severity levels.(2) Raters, in general, exhibited acceptable self-consistency in ratings, but low inter-rater consistency.(3) Raters did not exhibited significant bias toward tasks, and most raters rated consistently.(4) Some examinees'scores had differed by one grade or less from raw scores after Facets analysis.This study provided implications for enhancing rating consistency in large-scale online writing scoring. According to the model, those ambiguous or rough descriptions of rating scale could be amended; those raters with low consistency should be retrained or get replaced; finally, large-scale test was high-stakes, so before being reported students'raw scores could be adjusted according to the model measurement, which is exactly the innovation of this study.It is very necessary to control scoring errors in large-scale writing test, which means controlling of inter-rater and intra-rater consistency. Inconsistent raters could be checked out by fit statistics and bias values in Many-facet Rasch measurement, and those raters should be retrained or replaced. In this way, the reliability of large-scale writing scoring may get ensured and improved.
Keywords/Search Tags:reliability, rating consistency, severity, Many-facet Rasch measurement, FACETS
PDF Full Text Request
Related items