| Teacher acting as the sole rater produces many problems and disadvantages in the field of language testing, such as heavy workload for teachers, potential biases etc., which not only bothers the language educators, but also affects students’ evaluation results. In the past few decades, adapting alternative assessments in EFL teaching has increasingly become the focus of researches. But few studies have set foot in the comparison of all three types of assessments—self-, peer- and teacher-assessments by applying the method of Multifaceted Rasch model. The present study will make a comparative analysis among self-, peer-, and teacher-assessments in EFL writing tests and utilize the Multifaceted Rasch model. "Rater’s Effects" (Myford & Wolfe,2003) and "Learner Autonomy" (Holec,1981) will be employed as theoretical basis to support the application of alternative assessments in the study.There are 80 second-year non-English major students and two experienced EFL teachers taking part in the study carried out in Lanzhou University. All the students are required to assess their own writings and other three students’ writings according to certain evaluation rubrics; and the teachers are also asked to assign the students’ writings using the same rubrics. The results of students’ self-assessments, peer-assessments and teacher-assessments are collected and processed after the assessment. By applying Multifaceted Rasch Measurement, these collected data are analyzed through the computer program FACETS 3.22 (Linacre,1999)—the mainstream Multifaceted Rasch Measurement software. At last, the scoring results collected from AES tool www.pigai.org are processed together with self-, peer-and teacher-assessments. Through the Spearman’s rank correlation coefficient p, this additional comparison is processed in order to obtain the paired consistency.In the present study, there are totally three facets processed through Multifaceted Rasch Measurement, which include writers, raters and criterion items respectively. FACETS can produce all the results on the same linear measurement scale so that they can be compared together. After the FACETS analysis, several results can be obtained:(a) a FACETS map, (b) ability measures and fit statistics for each writer, (c) a severity estimate and fit statistics for each rater, (d) a bias analysis for rater×writer interactions, and (e) a difficulty estimates for each assessment criterion. The FACETS map can provide visual information about differences that might appear among different facets, such as differences in severity/leniency among raters and deferent ability estimates among writers. There are totally five research questions in the present study:1) To what degree do writers’ ability estimates, raters’ severities, and assessment criteria’s difficulties fit the model? 2) How do self-assessors, peer-assessors, and teacher-assessors differ when assessing writers’ ability estimates? 3) To what degree do self-assessors, peer-assessors, and teacher-assessors show bias towards writers’ abilities, and what types of bias are they? 4) How do self-, peer-, and teacher-assessments compare in term of assessment criterion difficulty? 5) To what degree are the self-assessors, peer-assessors, teacher-assessors and the online assessing website www.pigai.org externally consistent? All these questions can be answered through the results of FACETS and the Spearman’s rank correlation coefficient p.According to the research results, student-raters showed wider range of severity/leniency compared to teacher-raters; all three types of different raters revealed bias through analysis for rater-writer interactions, and all manifested independent bias patterns; compared with high-ability students, low-ability students were more easily grading their own and peers’writings with biased assessments; however teacher-raters tended to assess low-ability students more leniently then high-ability ones. In regard to assessment criterion difficulties, teacher-assessments were most diverse ones; while peer-raters showed the narrowest assessment criterion difficulties. Moreover, the criterion "content" was rated with most harsh scorings, and criterion "mechanics" was rated most leniently. As to the online assessing tool, its scoring results were not externally consistent with the self- and teacher-assessments in the study. In the last part, some implications and suggestions in regard to alternative assessments, rater trainings, assessments quality improvement, and EFL teaching were brought up in brief. |