| Rater training is an essential component of any reliable writing assessment in first or second language studies. However, little is known about the processes by which inexperienced raters learn to apply the appropriate criteria in making judgments about writing samples. This study examines this process through both quantitative and qualitative approaches. Sixteen raters, eight inexperienced and eight experienced, rated 30 essays on two topics before and 32 additional essays on the same topics after training using a three-part scoring rubric covering content, rhetorical control, and language. For four of these compositions, the 16 raters' verbalizations of their thought processes were recorded.; The ratings were analyzed using many-faceted Rasch measurement, which provides estimates of rater severity on a linear scale and fit statistics that reflect rater consistency. The major quantitative findings were as follows: (1) Inexperienced raters tended to be more severe and less consistent than experienced raters before training but not afterwards. (2) Inexperienced raters were significantly more severe on one topic than on the other topic before training. (3) Despite training, significant differences in severity among raters remained.; The verbal protocols from the inexperienced raters were examined for instances of three postulated training effects: clarification of rating criteria, revision of expectations of tasks and examinees, and concern for inter-rater agreement. Evidence of all three effects were found in the protocols to some degree. In addition, analysis of the protocols of all raters suggested that differences in severity on the two topics were related to different rhetorical strategies elicited by the two writing prompts, concordance of these strategies with the descriptors in the scoring guide, and raters' dissatisfaction with one of the two prompts.; Practical and theoretical implications of the findings are discussed, and suggestions for further research are made. |