| Test method characteristics have potential effects on rating scale content andfinal rating scores; it has been demonstrated that the methods we use to measure language ability influence performance on language tests (Bachman & Palmer, 1981, 1982; Brutsch, 1979; Clifford, 1978, 1981; Shohamy, 1983, 1984). Test method, however, is not just a unitary factor. Any method may include a variety of characteristics or facets, each of which has an effect upon scores. The most comprehensive discussion of method characteristics/facets is found in Bachman's (1990) account (Bachman & Palmer, 1996). Bachman discusses variables in the testing environment, task requirements, scoring procedures and rating scales, and test stimuli, all of which may affect student or examinee performance and scores. Performance tests typically require raters to judge the quality of examinees' written or spoken language relative to a rating scale; Scores of written language may be affected by variables inherent in the specific scale development process (Turner and Upshur, 2002). Scores may also be affected by variables in the scoring process.In this study we consider three variables in empirically derived rating scales: scale developers, the sample of performances used by the scale developers and the scoring procedure. These variables may affect scale content and structure and final test scores.In this study, one hundred and fifty written compositions came from the samples of a national Writing Contest in Shanxi region of China. The compositions were thentyped and divided into three subsets according to the scores given. Two of the subsetsserved as samples for the empirical development of a rating scale. The remaining compositions, the third subset were reserved for later scoring. The scale developers and the raters in this study were 8 postgraduates and three faculty members in EFL education from the School. The study used multiple regression to examine (1) thedevelopment and use of scales using two samples of EFL student writing and two teams of rating scale developers to construct four empirically derived scales; (2) whether the use of different samples of compositions will yield different scale descriptors by using the criteria derived empirically in an EFL context; (3) the extent to which the holistic and analytic scales derived empirically from the same or different samples contribute to student scores in an EFL context.This research was carried out with an objective to analyze variables in developing empirically based rating scales and scoring procedures for EFL writing tests. The author hopes that the study will provide an insight to the issues concerning developing empirically based rating scales, and can be of practical local benefit in the EFL planning and decision making process. |