Font Size: a A A

A Validity Study On Listening Comprehension Tasks-From The Perspective Of Assessment Use Argument

Posted on:2011-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:K Z PengFull Text:PDF
GTID:1115330332959107Subject:English Language and Literature
Abstract/Summary:PDF Full Text Request
In recent years, the field of language testing has developed a set of principles and procedures for linking test scores and score-based inferences to test use and the consequences of test use. Bachman (2005) developed Assessment Use Argument linking assessment performance to use (decisions). It is the continuation and innovation of Messick's validity theory (Messick, 1989), which asserted that test validation is an everlasting process by articulating claims and counterclaims in an assessment use argument. The present dissertation is such a test validation study on listening comprehension of TEM4 focusing on test task characteristics and test scores. A good insight into the validity can be gained by exploring those characteristics that affect the difficulty of test items, probing the relationship between such characteristics and test construct, and interpreting the test performance based on cut-off scores.First, the construct validation was conducted employing confirmatory factor analysis. With a random sampling of 10 per cent of the test data from 2005-2008 listening comprehension, the finding is that the constructs are desirably comparable– three genres, named, conversation, passage and news, have good factor loadings on the construct of listening comprehension, with factor loadings between .95-1. Test performances on these genres are closely relevant, with correlation coefficients between .82-.99.Then, factors that affect item difficulty were explored by correlation and regression statistics. It was revealed that 19 task characteristics are significantly correlated with item difficulty, among which 7 characteristics are text variables, 4 are item variables and 8 are text/item overlap variables. A regression analysis yielded 5 significant predictors of the difficulty of listening comprehension items, namely, inference item, lexical overlap between key words of incorrect options and key words of text, lexical density, words of correct option, and numbers of adverbial clauses. The multiple-R is .663, and the R-squared is .439 which accounts for 43.9% of the item difficulty variance. The best predictors are inference item, lexical overlap between key words of incorrect options and key words of text. The results of a hierarchical regression analysis confirmed such findings that the powerful predictors are the text and text/item overlap variables. What's more, the most powerful predictors are the text/item overlap variables. Predictors of the difficulty of listening comprehension items on genres were also located. There was only one significant predictor for conversation items: inference item, which accounts for 45.8% of item difficulty variance. There were two significant predictors for passage items: lexical overlap between key words of incorrect options and text, inference item, which account for 39.1% of item difficulty variance. There were three significant predictors for news items: inference item, syllables of necessary information, lexical overlap between key words of question stem and text, which account for 50.7% of item difficulty variance. What's more, within the continuum of genre from conversation, passage to news, the more complex the genre, the less influential of item variables, the more influential of text/item overlap variables.One of the noticeable points of the dissertation is to explore the factor structure of task characteristics and its relationship with test construct, along with the relationship between test construct and factor structure of task characteristics. By employing exploratory factory analysis, the study revealed that 13 task characteristics constitute 3 factor structures, namely text (including 6 variables), item (including 3 variables) and cognitive processes (including 4 variables). A regression model established by EQS 6.1 software showed that the most powerful predictor of difficulty of listening comprehension items is cognitive process factor, followed by item factor and text factor. The R-squared is very desirable, which indicates that the factor structure is a powerful predicator of item difficulty.Then, based on the factor structure, the test construct factor of listening comprehension is yielded. The test construct factor of listening comprehension accounts for 100% of the variance of cognitive process factor, in contrast, the test construct factor accounts for no more than 1% of the variance of text factor but accounts for about 8.4% of the variance of the item factor. Such finding revealed that a single text factor does not mean anything to the test construct of listening comprehension, the cognitive process elicited by the interaction between text and item does matter a lot for the construct. Also, question preview may have a tiny contamination to the construct of listening comprehension. Regression analysis showed that the construct of TEM4 listening comprehension which is composed of cognitive process, text and item factors has a very strong corrleation with item difficulty.After justifying the test validity, the dissertation probed the fairness of cut-off -score-based decision making from the perspective of criterion-referenced language measurement. In terms of the requirements of teaching syllabus and test syllabus, 70% as the cut-off score can yield a very desirable dependability index. Master group has a very good command of listening comprehension ability. In terms of listening proficiency, besides having proficiency of the middle-level students, upper-level students (with 80% or over of correct response) can integrate all kinds of information, and have a good command of implicit pragmatic information, can understand words in specific context, e.g., tour-guide commentary, and can grasp the main idea of international news. Middle-level students (with 60-70% of correct response), besides having the proficiency of lower-level students, can understand daily life conversations (such as campus life, job interview), can understand passages (such as academic topics and cultural topics), can understand implicit information and grasp the topic of domestics news, and can understand the explicit information of specific texts (such as tour-guide commentary). The lower-level students can basically understand and integrate the explicit information, can make inference within a limited capacity, and can roughly grasp the topic of international news and non-academic subjects.
Keywords/Search Tags:Assessment Use Argument, task characteristics, TEM4 listening comprehension, criterion-referenced language assessment, behavioral anchoring
PDF Full Text Request
Related items