Font Size: a A A

Detecting And Measuring Rater Effects In A Pragmatics Test

Posted on:2014-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:L J XieFull Text:PDF
GTID:2255330422955943Subject:Foreign language teaching techniques and evaluation
Abstract/Summary:PDF Full Text Request
The requirement that ESL learners should apply their ability to use learned skillsin real life situations triggers the growing interest of the field of language testing towelcome performance-based language testing. Such a requirement yields variousmeasures to test the ESL learners’ inter-language pragmatic knowledge, among whichWritten Discourse Completion Task (WDCT) is one often used for data collection andtesting purposes. The assessment is carried out by human raters according to certainrubrics. However, human raters may introduce errors into the final scores. As one ofthe rating errors, rater effect has been the focus of many previous researches. Thisstudy attempts to explore how common patterns of rater effect may exist in a WDCTpragmatics test by applying a MFRM approach. The author will use qualitative andquantitative methods to analyze the data and probe into the process of the raters’decision-making with an intention to find out the factors accounting for their ratingbehaviors and to provide valuable suggestions for rater training.The thesis first reviews the researches on communicative competences andperformance tests. And then it gives a brief introduction of rater effects proposed byMyford and Wolfe (2003). Main methods to examine rater effects in performance testare also within the consideration. The MFRM approach is utilized in this research.In this study,6university teachers (4Chinese teachers and2foreign teachers)were invited to rate the WDCT test which was administered to38(15males and23females) Chinese EFL university students aged from19to21. The raters rated the testindependently. The scores would be analyzed by many-facet Rasch model. Afterwards,recall interviews would be carried out with6raters respectively, aiming to analyze therating results qualitatively.The study was analyzed from four facets: the items, the examinees, the raters andthe traits. The results from the surveys indicated that the items were of significantdifferences between their difficulty levels. Among the four traits, the easiest one to get a high point was the Speech Act while Appropriateness and Expressions were of thehighest difficulty.6raters showed significant differences in terms of their ratingseverity. Rater A, a foreign teacher was the most severe rater. Most raters were foundto exhibit certain bias across both traits and examinees in their ratings. And allsignificant bias patterns could be divided into four categories. This study alsorendered eight pieces of implication for rater training and language teaching.
Keywords/Search Tags:rater effects, many-facet Rasch model, a WDCT pragmatics test
PDF Full Text Request
Related items