As an essential task,writing is an indispensable part in both classroom-based and largescale standardized assessments.It can examine students’ integrated language ability very efficiently.However,the accuracy and fairness of writing assessment have long become a problematic area owing to its subjectivity.Great discrepancy may occur in the scores of an essay assigned by different raters,which threatens the reliability,validity,and ultimately fairness of writing assessment.Eckes(2008)has pointed out that rater variability is one of the biggest challenges faced by language assessment researchers.Existing studies have revealed that rater background is one of the most important sources,leading to the deviation of writing scores to the “true score”.Within the framework of generalizability theory,this study examined the impact of rater background on the reliability and validity of writing assessment in college English final examination.To achieve this goal,the six raters engaged in this study were classified into different group based on their gender,educational background and writing rating experience.There were two groups under each background facet,with three raters in each group.The writing scores assigned by different rater groups were compared to find out whether there were significant differences.The writing samples used in the present study were 60 compositions collected from the English final test in June 2017.These compositions were all written by non-English major freshmen and rated by six raters holistically on a scale of 1-15 points.Each rater rated the compositions independently.The final writing scores were analyzed in EXCEL and GENOVA.Data analyses demonstrate that in the current 6-rater study design,the overall reliability and validity of the writing assessment were relatively low.Pairwise comparisons suggest that compared with female raters,male raters were less consistent in the rating process,but there were no obvious differences between the two groups in terms of rating reliability,convergent validity and discriminant validity;compared with raters with language testing educational background,there were much lower consistency,reliability,convergent validity and discriminant validity among raters without language testing educational background;compared with experienced raters,inexperienced raters were more inconsistent while rating,with lower reliability,convergent validity and discriminant validity.Finally,two raters were selected for semi-structured interview.The results indicate that the language testing educational background and rating experience exert impact on raters’ rating belief and behaviors to a certain extent,hence influencing the final writing scores.The findings in the present study reveal that raters’ gender has no influence on the quality of writing rating in the final test,but their language testing educational background and rating experience do.Therefore,the examination of causes of rater bias is very important for the investigation of reliability and validity issues in writing assessment.Once the underlying sources of rater bias are understood,effective measures can be taken to improve the reliability,validity and ultimately the fairness of a writing assessment. |