Rater Effects Of Human-machine Scoring By Many-facet Rasch Model

Posted on:2019-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:R J Ren

Full Text:PDF

GTID:2405330569487004

Subject:Foreign Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

Human-machine scoring has become fully operational for scoring essays since 1999.Because some scholars doubt that Automated Essay Scoring(hereinafter referred to as AES)cannot understand essays,many methods are used to explore the heterogeneity and the homogeneity of human-machine scoring.With related literature reviewed,a study on rater effects of human-machine scoring by Many Facet Rasch Model(hereinafter referred to as MFRM)is founded to be lacking.Following guidelines proposed by Myford and Wolfe(2004),this study aims at detecting leniency/severity effect,central tendency effect,randomness effect,halo effect,and differential leniency/severity effect of human-machine scoring by MFRM.Specifically,the following three research questions are to be answered in the study.Are there rater effects at the group level in human-machine scoring?Are there rater effects at the individual level in human-machine scoring?Are there differences in rater effects between human raters and iWrite?Set in the only large-scale writing assessment which adopts human-machine scoring in China,this thesis detects rater effects of human-machine scoring in �FLTRP Cup� English Writing Contest at the group level and the individual level.Five trained human raters and iWrite categorized and scored content,language,and organization of 164 argumentative essays written by 82 contestants in a preliminary round.A category rating scale transformed according to previous studies and a rating rubric released by FLTRP were used in the study.Total scores assigned by human raters and iWrite were categorized based on the standard deviation.Facets,a software programming MFRM,was used to analyze categories of the four traits.The major findings of this study are presented as follows.Firstly,there are not rater effects at the group level.Secondly,there are rater effects at the individual level.Specifically,Rater 5 exhibits central tendency effect and Rater 4 experiences differential leniency/severity effect while iWrite exercises central tendency.Thirdly,there are differences in rater effects between human raters and iWrite.Specifically,the leniency/severity of human raters and iWrite are different.Rater 4 exhibits differential leniency/severity effect but iWrite does not exercise differential leniency/severity effect.This study extends research subjects of rater effects and diversifies research methods of human-machine scoring.Besides,this study focuses on the bias of iWrite to examinees caused by different writing abilities of examinees and verifies central tendency effect exhibited by the training set of iWrite.Rater effects of human-machine scoring indicate the high reliability of human-machine scoring and imply that human-machine scoring can be further applied to writing instruction and more writing assessment.Similarly,rater effects of iWrite suggest the high reliability of iWrite and denote that the further application of iWrite to writing instruction and more writing assessment even though iWrite needs improving.

Keywords/Search Tags:

rater effects, human-machine scoring, Many-Facet Rasch Model, "FLTRP Cup" English Writing Contest, iWrite

PDF Full Text Request

Related items

1	Using Multi-facet Rasch Model Analyzing Rater Effects In Writing Scoring
2	A Rasch-based Study On Rater Effects In Writing Assessment
3	A Many-facet Rasch Model Analysis Of Rater Effects In CET-SET
4	A many-facet Rasch measurement analysis to explore rater effects and rater training in medical school admissions
5	A Contrastive Study On Rater Performance Between Native And Nonnative English Raters In English Writing Assessment
6	A many-facet Rasch analysis of rater effects on an Oral English Proficiency Test
7	A Study Of Consistency Between Iwrite Scoring And Human Scoring
8	Detecting And Measuring Rater Effects In A Pragmatics Test
9	Application Of Many Facet Rasch Model On English Writing Assessment
10	An exploration of test taker, rater, and item facets of the writing section of TOEFL using many-facet Rasch measurement