Font Size: a A A

A Contrastive Study On Rater Performance Between Native And Nonnative English Raters In English Writing Assessment

Posted on:2013-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2235330374490673Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
It is considered by many relative professionals whether nonnative English raters(NNE) and native English raters (NE) consistently assign valid scores on writingperformance assessment. Rater performance greatly plays a role in fairness andvalidity of writing test. This study is mainly to explore differences of raterperformance between nonnative English raters (NNE) and native English raters (NE)who use the same holistic scale on assessing English writings of ESL students. FourNNE raters and four NE raters were adopted to evaluate447essays simultaneouslywritten by ESL students of International College in Hunan University.The data were analyzed adopting FACETS software based on Multi-facet Raschmodel upon Item Response theory to investigate differences between the two groupson rater severity, intra-rater consistency, inter-rater consistency, and bias interactionstowards examinees of levels with different ability logits. The following results arelisted. Firstly, the raters differed from each other in severity, and there weresignificant differences on the holistic ratings awarded by the two groups of raters, aswell as their intra-rater consistency were acceptable with one exception of the NErater. And then, generally speaking, the NNE raters were far more lenient than the NEraters in scoring holistic ratings. Furthermore, these two groups slightly differed inintra-rater consistency and the NE raters showed higher internal consistency.Inter-rater consistency for the NNE raters was much greater than that of the NE raters.Lastly, the two groups of raters exhibited different bias interactions towardsexaminees with different levels of ability estimate. The two groups of raters had nodifference of bias interactions towards those examinees with extreme high abilityhigher than2.00logits and those with extreme low ability less than-2.00. Theymaintained much higher consistency towards the examinees with high ability and lowability. However, the NE raters were more likely to be more lenient than the NNEraters towards the examinees of ability logits within the range of1.00to1.99. Andthey showed many large bias interactions towards the examinees with the middleability, which indicated they had much confusion on assessing those examinees’ writings,Though the NE raters were more severe than the NNE raters, it couldn’t reach aconclusion that the NNE raters were not eligible, because the NNE raters were betterin reliability and showed less significant bias interactions towards examinees ofdifferent ability level. This study had important implications for English writingtesting and teaching. In addition, the application of FACETS on comparing raterperformance of writing assessment between two groups was a new try.
Keywords/Search Tags:Multi-facet Rasch Model, Rater performance, Severity, Consistency, Bias analysis
PDF Full Text Request
Related items