Font Size: a A A

Rater Effects And Behavior:Comparing Raters' Ratings In Different Rating Conditions In A Writing Assessment

Posted on:2021-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:M Y TanFull Text:PDF
GTID:2415330626459495Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Argumentative essay writing has been an effective predictor for measuring the proficiency of L2 writing ability for a long time,but its objectiveness and fairness during the rating process have still been controversial.Mainly,two reasons are contributed to this phenomenon.First,since essay writing scoring is a complex and subjective rating process,it is hard for most raters to avoid cognitive biases,subjective impressions,and rater effects.Second,many factors result rater biases and the studies on the rater effects cover varied and rich perspectives,mainly including rater background,rating methods,rating modes,and etc.With the development of scientific technique and artificial intelligence,the research on the objectiveness and reliability of the automated scoring has been varied too,but few studies discussed the changes of rater effects when scoring with the machine and its impact on rater behavior.The aim of this study is to investigate the consistency of raters' ratings and the change of rater behavior under two rating conditions.Furthermore,by comparing the differences between two rating conditions,the study can examine the reliability of raters' ratings when scoring with the machine and whether the rating condition is effective for minimizing the influence of language trait on other traits.That is to say,whether there is the halo effect under Condition 2.In this study,5 raters were invited to rate undergraduate argumentative writing under two conditions with random order and same rubric.Under Condition 1,all raters were required to give scores to all traits,including language,organization,and content traits.While under Condition 2,raters were only required to give scores to organization and content traits and iWrite rated language trait.All statistical data obtained were analyzed by MFRM.In addition,to make a further and deep study on rater behavior and perception,the author also conducted two interviews with each rater,asking the raters to review the whole rating process and explain their changes on rating behavior and rating cognition.Finally,the interview data were transcribed,marked and classified for analyzing.By comparing statistical data and interview data under two rating conditions,the result showed that all raters can score reliably and consistently in two rating conditions but they presented different severities.Raters tended to be harsher under Condition 2,especially for the content trait.Besides,content and organization traits were more differentiated and there was no significant halo effect under Condition 2.In general,most raters preferred Condition 2 when they only rated organization and content traits.They believe that iWrite can score the language trait more efficiently and fairly,and the halo effect from language trait may be minimized.
Keywords/Search Tags:Rater effects, rater behavior, halo effect, automated scoring
PDF Full Text Request
Related items