A Tentative Study Of Inter-rater Inconsistency In ESL Writing

Posted on:2007-09-09

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Yang

Full Text:PDF

GTID:2155360185951166

Subject:Foreign Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

As one important form of performance tests, writing has been attached great importance in recent years. It is widely used in large-scale English examinations, as well as in small-scale ones. Inclusion of writing tests has become essential in most English tests nowadays. Writing is featured by high validity, because it can test different levels or factors composing language competence, and it can also indicate whether the language is used appropriately. It has strengths like productivity. However, it lacks reliability in scoring. Because writing is a kind of subjective tests, the answers are open-ended and thus provide discrepancy between different raters.The evaluation and assessment of second-language writing has been one major area in testing writing, because there exist great controversies in rating or scoring. No matter what the test method is, the reliability of ratings is one of the major issues in performance-based assessment. Test scores of subjective tests are affected by the rater aspect. The inconsistency in ratings between raters can be attributed to three causes: different interpretations of scales, different standards of severity and reaction to elements not relevant to scales (Bachman, 1990:208-212), while the same rater will score differently under different weather, physical condition and scoring time. This thesis will focus on inter-rater differences on the condition that the intra-rater inconsistency has been reduced to the least. To be exact, it explores, in EFL writing context, how new raters differ from each other in scoring the same group of writing samples.The whole process of study can be divided into three steps: Firstly, nearly two hundred students attended a writing test. The writing papers were collected, randomly selected and copied. Secondly, eight raters (including six new raters and two expert raters) were asked to rate the writing samples in the same period of time. Thirdly, they did the think-aloud protocol in a language lab. All the protocols were transcribed later.The analyses of the results proved at last 1) there is significant difference between new rater group and expert rater group. 2) Correlation analysis indicates new raters are less reliable in assigning scores to writing samples. Differences between new raters are significant, too.3) The think-aloud protocol generated verbal accounts of the thought processes and decisions that new raters make while evaluating compositions. By studyinghow they adhere to the rating scale, it is found that new raters show relative consistency in evaluating "relevance to the topic" and "coherence", but inconsistency in "clarity of meaning" and "mechanic errors". Besides these variances, other elements distract them from adhering to the rating scale and play roles in their assessment.The results I have presented are naturally tentative and not be generalized beyond the population from which the data were collected. However, it is hoped that the results are instructive to the present language teaching and writing tests. It suggests that new teachers should be well prepared by studying the rating scale before rating, and learn to balance major and minor factors.

Keywords/Search Tags:

inter-rater inconsistency, EFL writing, scoring method, think-aloud protocol

PDF Full Text Request

Related items

1	A FACETS Analysis Of Rater Bias In Measuring Chinese Students' English Writing
2	Rating Differences Of Experienced Raters In Assessment Of English-Chinese Translation Tasks
3	The Effects Of First Language Thinking In EFL Writing Process On EFL Composing: A Case Study Based On Think-aloud Verbal Protocol Of 20 English Majors In China
4	Effects of scoring method and rater experience on ESL essay rating processes and outcomes
5	Assessing inter-rater reliability in scoring the Attachment Story Completion Task with kindergarten childre
6	The Evaluation And Research Of Rater Reliability With LONGFORD Method
7	Rater Effects And Behavior:Comparing Raters' Ratings In Different Rating Conditions In A Writing Assessment
8	A Study On Rater Bias Patterns In Rating CEPT Writing
9	The Application Of Think Aloud Protocol In The Reading Test Of New HSK Band 5
10	Rater Effects Of Human-machine Scoring By Many-facet Rasch Model