Font Size: a A A

A Tentative Study Of Inter-rater Inconsistency In ESL Writing

Posted on:2007-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:L Y YangFull Text:PDF
GTID:2155360185951166Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
As one important form of performance tests, writing has been attached great importance in recent years. It is widely used in large-scale English examinations, as well as in small-scale ones. Inclusion of writing tests has become essential in most English tests nowadays. Writing is featured by high validity, because it can test different levels or factors composing language competence, and it can also indicate whether the language is used appropriately. It has strengths like productivity. However, it lacks reliability in scoring. Because writing is a kind of subjective tests, the answers are open-ended and thus provide discrepancy between different raters.The evaluation and assessment of second-language writing has been one major area in testing writing, because there exist great controversies in rating or scoring. No matter what the test method is, the reliability of ratings is one of the major issues in performance-based assessment. Test scores of subjective tests are affected by the rater aspect. The inconsistency in ratings between raters can be attributed to three causes: different interpretations of scales, different standards of severity and reaction to elements not relevant to scales (Bachman, 1990:208-212), while the same rater will score differently under different weather, physical condition and scoring time. This thesis will focus on inter-rater differences on the condition that the intra-rater inconsistency has been reduced to the least. To be exact, it explores, in EFL writing context, how new raters differ from each other in scoring the same group of writing samples.The whole process of study can be divided into three steps: Firstly, nearly two hundred students attended a writing test. The writing papers were collected, randomly selected and copied. Secondly, eight raters (including six new raters and two expert raters) were asked to rate the writing samples in the same period of time. Thirdly, they did the think-aloud protocol in a language lab. All the protocols were transcribed later.The analyses of the results proved at last 1) there is significant difference between new rater group and expert rater group. 2) Correlation analysis indicates new raters are less reliable in assigning scores to writing samples. Differences between new raters are significant, too.3) The think-aloud protocol generated verbal accounts of the thought processes and decisions that new raters make while evaluating compositions. By studyinghow they adhere to the rating scale, it is found that new raters show relative consistency in evaluating "relevance to the topic" and "coherence", but inconsistency in "clarity of meaning" and "mechanic errors". Besides these variances, other elements distract them from adhering to the rating scale and play roles in their assessment.The results I have presented are naturally tentative and not be generalized beyond the population from which the data were collected. However, it is hoped that the results are instructive to the present language teaching and writing tests. It suggests that new teachers should be well prepared by studying the rating scale before rating, and learn to balance major and minor factors.
Keywords/Search Tags:inter-rater inconsistency, EFL writing, scoring method, think-aloud protocol
PDF Full Text Request
Related items