Font Size: a A A

Rater Effects Of Human-machine Scoring By Many-facet Rasch Model

Posted on:2019-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:R J RenFull Text:PDF
GTID:2405330569487004Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Human-machine scoring has become fully operational for scoring essays since 1999.Because some scholars doubt that Automated Essay Scoring(hereinafter referred to as AES)cannot understand essays,many methods are used to explore the heterogeneity and the homogeneity of human-machine scoring.With related literature reviewed,a study on rater effects of human-machine scoring by Many Facet Rasch Model(hereinafter referred to as MFRM)is founded to be lacking.Following guidelines proposed by Myford and Wolfe(2004),this study aims at detecting leniency/severity effect,central tendency effect,randomness effect,halo effect,and differential leniency/severity effect of human-machine scoring by MFRM.Specifically,the following three research questions are to be answered in the study.Are there rater effects at the group level in human-machine scoring?Are there rater effects at the individual level in human-machine scoring?Are there differences in rater effects between human raters and iWrite?Set in the only large-scale writing assessment which adopts human-machine scoring in China,this thesis detects rater effects of human-machine scoring in “FLTRP Cup” English Writing Contest at the group level and the individual level.Five trained human raters and iWrite categorized and scored content,language,and organization of 164 argumentative essays written by 82 contestants in a preliminary round.A category rating scale transformed according to previous studies and a rating rubric released by FLTRP were used in the study.Total scores assigned by human raters and iWrite were categorized based on the standard deviation.Facets,a software programming MFRM,was used to analyze categories of the four traits.The major findings of this study are presented as follows.Firstly,there are not rater effects at the group level.Secondly,there are rater effects at the individual level.Specifically,Rater 5 exhibits central tendency effect and Rater 4 experiences differential leniency/severity effect while iWrite exercises central tendency.Thirdly,there are differences in rater effects between human raters and iWrite.Specifically,the leniency/severity of human raters and iWrite are different.Rater 4 exhibits differential leniency/severity effect but iWrite does not exercise differential leniency/severity effect.This study extends research subjects of rater effects and diversifies research methods of human-machine scoring.Besides,this study focuses on the bias of iWrite to examinees caused by different writing abilities of examinees and verifies central tendency effect exhibited by the training set of iWrite.Rater effects of human-machine scoring indicate the high reliability of human-machine scoring and imply that human-machine scoring can be further applied to writing instruction and more writing assessment.Similarly,rater effects of iWrite suggest the high reliability of iWrite and denote that the further application of iWrite to writing instruction and more writing assessment even though iWrite needs improving.
Keywords/Search Tags:rater effects, human-machine scoring, Many-Facet Rasch Model, "FLTRP Cup" English Writing Contest, iWrite
PDF Full Text Request
Related items