Font Size: a A A

Exploring Rating Process And Rater Belief-Seeking The Internal Account For Rater Variability

Posted on:2010-11-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:1117360275487207Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
In the context of performance assessment (typically writing and speaking tasks), rater variability introduced by subjective scoring has long been held as a challenge to ensure reliability and validity. Rather than focus on the consistency of final scores awarded by different raters, the present study takes a focusing-on-rater and process-oriented approach to investigating rater variability, with the aim to explore the underlying factors which would account for differences in raters' rating performance and the decisive internal features characterizing good raters. Specifically, the present study compares raters with different levels of rating accuracy in terms of their internal rating processes and scoring-related beliefs in the context of CET4 (College English Test Band 4 in China) essay scoring. By exploring into inherent variability in raters' dynamic executive mental behaviors during rating and their belief systems, it is therefore expected to reveal the mechanism of how factors from inside the raters would exert influence on raters' rating performance.Three major sources of data were collected through separate procedures in the empirical study, including an independent rating session, a concurrent-think aloud session and a subsequent semi-structured focused interview session. Different approaches of data analysis were utilized for each of these data sources. Raters' rating performance as compared with the expert norm was first assessed using MFRM. Raters were then classified into two groups with higher and lower levels of rating accuracy based on the calibrations of their rating performance. In what follows, based upon the verbal protocol analyses of raters' concurrent think-aloud and interview transcripts, comparison has been made between the identified GOOD and Not-So-Good(NSG) groups in terms of their internal processes (text focus and mental processes) during rating and their scoring-related belief systems.Comparison between the two rater groups have yielded considerable differences in many important ways. It was mainly found that GOOD raters would direct their attention to a wide range of textual features and their constructed text image would be more comprehensive and balanced than the NSG raters to include well-organized information on different linguistic levels of the writers' language performance as well as different aspects of text quality in the target essay. When processing the acquired information, GOOD raters would more often adopt effective error-diagnosing, summarizing and inferring behaviors to abstract, integrate and categorize the details and particularities in their text images into evaluation and inference about the essay quality and conduct more self-monitoring strategies such as weighing and assessing own ratings to make self-reflection on their own rating accuracy. Different rater groups also differed in their scoring-related beliefs. The major finding is that GOOD raters often have a comprehensive, balanced, and well-organized beliefs about the assessment target, in which a wide range of requirements on examinees' writing ability and performance are involved and clear differentiation is usually made among different linguistic levels of language use and different aspects of text quality, with a well-ordered hierarchy of the importance for those levels and aspects in defining essay quality. In addition, their interpretation and operationalization of the rating rubrics as well as their perceived effective solutions to the uncertainty during their decision-making would be more clear and systematic than their NSG counterparts.Through comparing and linking the detected rater variability at the above levels, a chain of influence on the final ratings has been established. It was therefore contended that raters' scoring-related beliefs would serve as the internal context in which the dynamic rating processes take place and that their different belief systems would naturally lead to different patterns of text focus and rating behaviors in the rating process and finally to the variability and different levels of accuracy in their final ratings. Furthermore, given the Finding that rater beliefs are at the root of those sources of variability among raters, the main objectives in the whole scoring system of large-scale performance assessment should be to explicitly establish the expected understanding of the assessment target and instrument, effectively communicate it from the higher level of administration to the individual raters and gradually bring the raters into a common "judgment community" which would share similar core beliefs about scoring in the given test context.
Keywords/Search Tags:rater belief, rating process, rater variability, judgment community
PDF Full Text Request
Related items