Font Size: a A A

Rater's Reliability Analysis Of CEPT Speaking Test By Using Multi-facet Model Of GT

Posted on:2011-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:H Y SuFull Text:PDF
GTID:2155360308468779Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
As a testing method of embodying the testee's true communicative ability, speaking test has many advantages that other kind of tests cannot reach the level. Therefore, any kind of scientific and ideal language test has to contain the speaking test. Due to the characteristics of speaking test and its rating subjectivity, however, the rating job of speaking test has faced many problems and challenges. Many facets that lead to low reliability are brought out by measurement errors.Generalizability Theory, as a modern testing theory, has provided a brand-new testing method towards the speaking test reliability. First of all, in the whole observed universe score, GT tries to nail down the object of measurement, facets of measurement and their interactive correlation. On this basis, the researcher can design the fully-crossed design, nested design or the mixed design. In the second place, the researcher can process the experimental data, and then estimate the contribution of various variances towards the whole variance by utilizing the experiment design and ANOVA analysis. This procedure is the process of G-Study. On the stage of D-Study, the researcher can get their specific purpose through adjusting different measurement facet, sample size and measurement construct.This paper has done some research on rater's reliability of College English Placement Test in Hunan University by using GT. Through analysis, this study has come to the following results:1. The analytical result shows that the overall reliability of raters is very high. But it also demonstrates the obviously different severity among them. Rater 3 has the lowest severity. Rater 5 and 6 has shown great consistency during rating. What's more, the difficulty of entire speaking test is acceptable, and the discrimination index is good as well.2. Through the analysis of measuring item's facet under different conditions on D-Study, this study has come to the estimation that when the item facet has reached 6, then the G coefficient can go up to a higher level.3. On the process of D-Study, this study has estimated the reliability index of rater's facet starting from one to ten. We can figure out that how many raters are suitable under different item's facet.4. Through the adjustment and estimation of different measuring facets, this study has come to the optimal design. For instance, when the item facet is six, and rater's facet is four, it can reach the high reliability level comparing with the default design.This research has a very important meaning in that it is an original study towards the CEPT speaking test and its rater's reliability analysis. What's more, it provides experimental evidence towards its further development and perfection. The research result points out the existing problems in the oral speaking test and also makes some advice. The result also presents the different consistency among raters, which will be beneficial to select one qualified rater. In addition, the method of using GT to analyze and check reliability can shed light to the research of oral test reliability study.At last, this paper brings out the research limitations and further research towards this field.
Keywords/Search Tags:CEPT speaking test, rater's reliability, Generaiizability Theory
PDF Full Text Request
Related items