Font Size: a A A

On The Scoring Validity Of Writing Assessment In PETS3

Posted on:2008-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhaoFull Text:PDF
GTID:2155360215464192Subject:English Language and Literature
Abstract/Summary:PDF Full Text Request
Writing assessment is an indispensable part of language testing, which exists in almost all kinds of large-scale tests such as CET, TEM, TOEFL, and PETS is no exception. But it is difficult to ensure its scoring reliability or scoring validity, for it is threatened by many factors, among which rating is one of the most important ones. As a large-scale test, PETS has attracted and will continually attract much more attention, but there appear no studies of the scoring validity of writing assessment in it now.This thesis intends to investigate the scoring validity of the essays in PETS3. The general research question is: To what extent are the essays in PETS3 valid in terms of scoring validity? In order to fully address the general research question, two sub-questions are investigated. (1) To what extent are the original raters consistent with the standardized raters in the assessment of the essays in PETS3 in March 2004 and that in March 2005? (2) To what extent do the original essay scores of PETS3 in March 2004 and those in March 2005 correlate with each other?In this study, the essays were systematically sampled on the basis of PETS corpus. The sampled essays were re-rated to get standardized scores to compare with original scores on sampled essays. The data were analyzed using the correlation and T-test procedures.Statistical analysis of the data yields the following findings:1) The original ratings and standardized ratings on essays in PETS3 in March 2004 are significantly correlated (r=0.869, P=0.000<0.01), and there is no significant difference between their mean scores. But the mean score of original ratings is higher than that of standardized ratings on essays in PETS3 in March 2004. It suggests that although original raters are consistent with standardized raters on essays in PETS3 in March 2004, the weakness is that original raters are a bit lenient when rating the essays.2) The original ratings and standardized ratings on essays in PETS3 in March 2005 are significantly correlated (r=0.798, P=0.000<0.01), but their mean scores are significantly different at the 0.01 level. It suggests that although original raters are consistent with standardized raters, there is a significant difference between their mean scores on essays in PETS3 in March 2005. In addition, the mean score of original ratings is higher than that of standardized ratings. It suggests that original raters are more lenient when rating the essays than those standardized raters.3) The mean score and the standard deviation of original ratings on essays in PETS3 in March 2004 are lower than those in March 2005. Not only are they not correlated with each other, but also their mean scores are significantly different. It suggests that the inter-year ratings on essays in PETS3 may not equate.The qualitative data indicate that the test prompts in PETS3 in March 2005 are not specific in terms of the writing procedures and scoring methods for test takers to follow, which results in that some testees could not write accurately. In addition, the raters are not shown how to weight the scores.This study suggests that the PETS Testing Center should not only keep the item banks stable, but also have a qualified, stable and fair rating team in order to ensure that the rating scores are stable and reliable. And with the research products of CET and TEM, the PETS Testing Center should enhance the development of the items and ensure that the items are designed in more scientific and standardized ways.
Keywords/Search Tags:writing assessment, scoring validity, PETS3
PDF Full Text Request
Related items