Font Size: a A A

A Rasch-based Study On Rater Effects In Writing Assessment

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:J YuanFull Text:PDF
GTID:2335330542970515Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Writing assessment,as one of the basic language performance assessments,has been widely used in English language testing.However,it is difficult for raters to directly and objectively assess one's writing proficiency(Cho 2003;He Lianzhen&Zhang Jie 2008).Factors,such as raters,tasks and rating scales as well as their interactions in writing assessment introduce errors into rating scores.The scores of subjective rating process in writing assessment greatly depend on decisions made by raters,therefore,variances of rater performance influence rating outcomes.McNamara(1996)pointed out that the set of variables surrounding the behavior of raters would have effects,which directly influence the rating scores.Therefore,the studies on rating errors do not merely focus on raters but also the interactions between raters and variables related to rater behavior,such as rater gender,linguistic background and professional background,etc.McNamara(1996)concluded that rater background such as gender and rating experience background had a significant impact on rating outcomes.However,studies on rater effects and rater bias patterns from the perspective of rater background are limited,and the results and conclusions are inconsistent.This study adopted Multi-facet Rasch Model to probe deeper into rater performance in writing assessment,such as rater severityleniency,rating consistency and central tendency,etc.Meanwhile,bias analysis was conducted to explore the bias interaction patterns between raters,students and rating scale against different rater gender groups and different rating experience groups.Finally,an interview was carried out to make further exploration to actual rating process.The writing test in this research was conduct in students' English course,a total of 137 essays(7 were used in rater training and 130 were used in actual data analysis)were collected via Pigaiwang.Seven raters were invited to rate 130 essays.In rating process,raters first received rater training and then,graded the essays by using an analytic rating scale which included four scale dimensions,namely task achievement,lexical level,syntactic level and coherence and cohesion.This study analyzed the rating raw scores through Rasch Model.The analysis was conducted from the perspective of individual rater performance and bias patterns among different gender groups and rating experience groups.The results are listed as following:1)Seven raters presented significant variances on rater severity,more or less have randomness effect.They all maintained high inter-rater consistency.Seven raters except one showed high self-consistency,and no significant central tendency was revealed.2)Different gender raters showed variances on rating outcomes.Male raters were more lenient than female raters and presented higher rating consistency but their scores were more clustered.Female raters were more likely to have randomness effect and to demonstrate bias interactions with candidates and rating scale.3)Different gender raters had variances on rater-candidate and rater-rating scale interaction patterns,but the variances were not statistically significant.Therefore,it is founded that gender factor would not introduce systematic errors into rating scores in this study.4)Different rating experience raters showed variances on rating scores.Inhand and veteran raters were more lenient than novice raters.Inhand and veteran raters presented higher rating consistency,but were more likely to have central tendency.Novice raters tended to rate randomly and were inclined to have bias interaction with candidates and rating scale.5)Different rating experience raters showed differences on rater-candidate and rater-rating scale interactions pattern.Through Chi-square test,the differences were statistically significant.Therefore,rating experience was a factor that would introduce systematic errors into rating scores in this study.Based on the findings concerning individual rater's rating performance,this study can provide specific and effective suggestions to improve rating reliability and rating quality of a writing assessment;Besides,it can bring valuable feedback to rater training by investigating how a writing criterion is interpreted and applied by raters with different backgrounds;More importantly,bias analysis between rater and rating scale will directly reveal the scale function and applicability of the rating scale,which can in turn help effective revision of the rating scale;Furthermore,this study will offer implications to English writing teaching and learning by studying raters' focus on different dimensions of a writing.Above all,this study proves that MFRM is a promising and useful tool to supervise and control rating quality for better rating reliability in language performance assessments.
Keywords/Search Tags:writing assessment, rater effects, rater bias, rater background, Multi-facet Rasch Model
PDF Full Text Request
Related items