| The20thcentury has seen an attempt to weigh, gauge, and count not just clear andnoticeable physical objects but also unseen forces and imagined abstract concepts.The flourishment of modern scientific language testing has been one side of this effortto measure a facet of human ability. Since the days of World War I, psychometricprinciples and practice have “come to dominate the testing foreign languageproficiencyâ€(Spolsky1), and this move has spread throughout the outside world of theUSA. Initially the testing was confined to helping students learn or to determining thequalifications of individuals seeking employment. But from its advance, testing hasbeen exploited also as a method of control and power—as a way to select, to motivate,even to punish.Advances in measuring methodology and linguistic and psychological theory hadbeen the driving force behind the development of language teaching and languagetesting. We see the Audio-lingual Method as the result of the application of structurallinguistics and Skinnerian learning theory. We interpret the cognitive approaches as thenatural and reasonable coming-out consequence of the theoretical revolutions of thetransformational-generative grammar language theories. We regard the notional-functional syllabus as related to theories of pragmatics and communicative competence.Under these bases, Spolsky (1999) proposed three periods of language testing, onetraditional, a second modern or psychometric-structuralism, a third period of testingto be post-modern or in an academic way as psycholinguistic-sociolinguistic, none of the shift of one to another doesn’t accompany the transformation of testing items,mechanics and technics.Language Testing and Teaching are hand in hand. They go along the road oflanguage teaching. Tests possess pedagogical functions. They are used in teaching as ameans to ensure effective teaching, to improve teaching quality, to obtain feedback onstudents’ learning progress and to weigh on teaching programs. Tests can alsoperform research functions. They are common measurement in research studies, suchas the means of gaining language acquisition as well as language proficiency data forresearch and administrative decisions. Senior high school teaching is of no smallconsequence in its teaching as an effective method of checking and evaluating theEnglish teaching quality as a major curriculum especially as the case of Lhasa SeniorMiddle School of Tibet which nearly takes testing cores for the only standard of a goodteacher or a successful learner. However the testing and teaching practice areemploying a contradictory agent toward each other. On the one hand, the Englishtesting is playing a sole and crucial important part during the teaching and learningevaluation, while as on the other hand, satisfactory testing construction is in poorsupply. The main reason for this dilemma lies in the school teachers’ short knowledgeof linguistic testing theory. As my questionnaire manifests later, most of the teachersthere know little about the past life and present fashion, the validity, the reliability, thebackwash of language testing and the distinction between various tests, so on and soforth. All in all, few teachers know how to construct appropriate testing items, which isbasic and of vital importance in administrating a valid and reliable achievement testafter a full semester of industrious teaching and learning from the teacher and students.As for Lhasa Senior Middle School, which models as a window school to theeducation cause of modern Tibet, its new curriculum population and promotionprogram is enduring the hardships on a pioneering road. There the achievement test(they name them Module Level Tests,or xueduan kaoshi of pinyin) is nearly the onlypresentation of summative evaluation which took the place of the former mid-and-final term examinations since the year of2011when the new curriculum entered Tibetsenior schools. However the quality of this magnificent test is in poor condition and tosome extent its reason dawns from low quality of testing item design and test paperconstruction. The formalistic secrecy of “cross test design†method, and the falseexpert colleagues scrutiny of the testing materials adds to the poor situation ofembarrassment in Lhasa Senior Middle School. In our daily life the terms of testing, evaluation, and examination are often used ina synonymous way, at least in this thesis, the writer is using the terms of test,testing(ceshi or ceyan in pinyin) and examination (probably kaoshi in Chinese) in thesame sense. Actually, a clear cut of the differences among them is of vital importanceto the development and use of language tests. According to Bachman (1999),“Test†isdefined as “a procedure designed to draw certain behavior from which one can makeinferences about certain characteristics of an singleâ€(Bachman20). Thus we know thata test is a measurement instrument to obtain a specific sample of an individual’sperformance and to exactly quantify individual characters according to set procedures.A language test naturally measures the behavior of language use, i.e., language abilityof the testees. Secondly,“evaluation†was defined (Weiss,1972in Bachman,1999) as“the complete gathering of information for the purpose of making decisionâ€(Bachman22). As is also the case of language testing, the testing results are useful for theeducational decisions. The possibility of making a decision correctly in a certainsituation is a responsibility not only of the skill of the decision maker, but also of thatof the data which the decision is functioned upon. Everything else being fixed, themore reliable and relevant the data (mainly the test scores here), the better the decisionis probably made. Fianally, Wang Zhenya (2008) conceives, in a narrow sense,“examination†is an assessment “more often connected with syllabus or certain testingbodies’ formal practice, as well as restricted to objective and discrete-piont testingsâ€(Wang Zhenya73-74).A clear and explicit definition of language ability is essential to all language testdevelopment and use. The model of language ability proposed by Bachman (1999) isessentially and enjoys wide acceptance, which defines language ability as involvingtwo components: language competence, or what we call language knowledge, andstrategic competence, which we will describe as a set of metacognitive strategies(Bachman&Palmer67). Language knowledge, which is information specific tolanguage use that is stored in memory, includes both the knowledge of organizationaland the pragmatic knowledge. The former one, which includes grammatical knowledgeand textual knowledge, enables language learners to create utterances or sentences thatare accurate in grammar, and to combine these factors to form texts. Pragmaticknowledge, which includes functional knowledge and sociolinguistic knowledge,ensure language users to relate words, utterances, and texts to the forming ofconcepts, communicative objectives, and the features of the language use setting. Strategic competence enables language users to engage in goal setting, assessment, andplanning. It is this combination of language knowledge and metacognitive strategiesthat provides language users with the ability, or capacity, to create and interpretdiscourse, either in responding to tasks on language tests or in non-test language use.With the testing theory, we know the quality of a test is determined by the essentialcomponents of validity, reliability and the usefulness. The constructers of the testsmust also take the circumstances of students, the backwash and the difficulty value ofthe testing items into their consideration. They must know the distinction of varioustests. They must know how to write good exams. And in all, they must learn about the llanguage testing theory to be aware of the history of language testing, the definition,the usage and what’s-ins of the notions of validity, reliability, usefulness, the backwashand the facility value of a definite test. Writing easy achievement tests are of vitalimportance to the Tibetan students of low level of slef-esteem, motivation and un-successful foreign language acquisition,.The first modern language test is less than a hundred years old. Before the1940s,the first stage of linguistic testing represented the traditional teaching approach andcould be classified as pre-scientific testing where language was taken for theknowledge of grammar, vocabulary and phonetics and the testing of those threecomponents was the major task of that period of time duration. The second stage oflanguage testing is the psychometric-structuralism testing, which is the direct result ofthe combination of the psychological behaviorism and structural linguistic theory anddating back to the1940s cares more for the skills of a language. On the third phase,influenced by the trend of communicative language teaching, the testing ofcommunicative language abilities is emphasized and thus the testing of this duration ofyears has consequently achieved its rather long name as psycholinguistic-sociolinguistic testing from nearly the1980s up to now.According to Zou Shen’s argument, tests, though varied, can be roughly classifiedinto several categories in line with the purpose of testing into proficiency, achievement,placement adding to diagnostic tests, with test construction into direct tests and indirecttests, with test format having discrete-point and integrative tests, with their scoringmethods including objective and subjective tests and with the score interpretation, theclassification falling on into norm-referenced and criterion-referenced tests.(Zou Shen,30-34)Lyle F. Bachman (1999), points out, the “content†of language tests can be either based on language proficiency theory or a specific category of content (generally acourse syllabus)(Bachman71). We can refer theory-based tests to proficiency tests,while syllabus-based tests are generally pointed at achievement tests. Placement anddiagnostic tests areaimed at admission decisions. When test scores are explained inconnection with the behavior of a special group of individuals, we refer the testsinterpretation to norm-referenced. If, however, they are interpreted according to acertain level or specific ability, we talk about a criterion-or domain-referenced testresult presentation.“Subjective†tests are distinguished from “objective†tests entirelyin terms of scoring procedure. In an objective test, the correctness of the test taker’sresponse is determined entirely by predetermined criteria so that no judgment isrequired on the part of scorers. In a subjective test, on the other hand, the scorer of thetests must judge the correctness of the respond of the testees based on the former one’ssubjective knowledge or understanding of the scoring criterions. In Arthur Hughes’eyes (2000), testing is to be “direct†when it requires the takers to present exactly theskills which we hope to see. While “indirect testing†tries to measure the abilitieswhich we are digging out and are under the performed skills.“Discrete point testingâ€refers to testing one element of the candidate’s abilities at one time, item after item.“Integrative testingâ€, on the other hand, requires the testees to combine as manylanguage elements and strategies of learning or communication as possible in theirfinishing of a single task (Hughes14-16).While the above features for classifying language tests are distinct, there are someareas overlapped. Nobody and no categorizations can perfectly classify or umbrella allthe tests with full agreement and without exceptions. Meanwhile, as the mainconsideration of this paper, more weight will go to the achievement test, which alsoshares far more space in Arthur Hughes’ Bible-like book Testing for LanguageTeachers (2000). Achievement tests are straightly aiming at language courses orsyllabus, their purpose being to detect how good individual students, a group or groupsof students, or the courses themselves have been successful in achieving their once-setgoals or objectives. They consist of two kinds: the final achievement tests andprogress(or midterm) achievement tests. The first are those sat on at the end of a course,so they are also called end-term examinations. These tests must connect themselveswith the courses with which they are interrelated.“Progress achievement†tests aredesigned to measure the progress the learners are making. Except for the midtermexams, monthly examinations are among the examples. These exams require careful planning, so teachers should freely set their own format of testing papers instead of“borrow†from online or other resources. The most important thing about theachievement tests is that their contents should be strictly based on a course syllabus.Reliability and validity are two critical qualities for all kinds of tests. Reliability isoften defined as “consistency of measurementâ€(Bachman&Palmer19). A reliable testscore is consistent beyond different dates of testing and circumstances of testingsituations. Validity is regarded as the extent to which a test measures what it isintended to measure. Hughes (2000) further discusses the various categories of content,construct, and face validity. Of the four he claims a test has content validity if itscontent form a collective sample of the language skills or structures, hascriterion-related validity if its results agree with those of independent and highlyreliable assessment, has construct validity as long as it measures just the ability whichis attempted to measure, and lastly, a test is agreed to have “face validity†if itseemingly measures what it is to measure at the first glance by an expert orexperienced test user. He claims that although the face validity is not really a scientificconcept, a test which fails to have it may not be accepted by the test sitters, users oreducation authorities, and it may be rejected right away (Hughes22-23). Reliability isthe other of the two essential qualities of measurement. Reliability is a quality of testscores, and pertains to the extent to which scores are free from measurement error,while validity decides how meaningful, appropriate, and useful test results are for aparticular purpose. Every effort should be employed in constructing and assuring thevalidity of tests. To be valid, a test must constantly provides accurate measurements.However, an exactly reliable test can not be valid at all. In our attempt to make reliabletests, we may be cutting off their validity. An emphasis on one might equal the loss orneglect of the other. There seems always to be some conflict and dilemma betweenvalidity and reliability. The tester and testing developers have to make somecompromise and negotiation between them.Facility value or difficulty refers to the degree to which a test or testing item fallsin the ability categories of an individual or a group of candidates of a certain test(Wang Zhenya58). To the case of the testing of achievement, the testing developersmust make sure that a majority of his or her testing samples are easy enough for themost part of the students of Lhasa Senior Middle School to pass and gaincomparatively higher marks in their tests to assure their confidence and poor interest intheir later efforts of learning the subject of English. As my table in the appendix of this paper has indicates, the scores of the Tibet students are at such a low level that notesting items are easy for them, for their abilities are too tender. To this point, every testpaper writer must pay tremendous attention.Wash back, or back wash, means the impact a test may exert on teaching orlearning. Backwash can be both harmful and beneficial and the crisis of choice is leftto the testing writers. Achievement tests are regarded as important, then the preparationfor them can come to dominate all the later-on process of teaching and learningactivities. An instance of this would be in Lhasa Senior Middle School where thestudents are following an English course which takes too much learning time necessaryfor their present low level of English proficiency in the training of language skills, butwhere the achievement language test which they have to take seldom or does not testthose skills directly, for example, the items of the tests have been downloaded from thehigh-level testing base in the advanced easern regions of China. The back wash of thistest must be harmful. The learners will be discouraged and the teachers teaching willbe found fault with, because the students of west China think highly of the test result.Davies (1968) has said that “the good test is an obedient servant since it follows andapes the teachingâ€.In Lhasa Senior Middle Schools the main task types in the achievement tests arethe multiple-choices of listening, grammar, cloze, reading and written items ofsentence or passage proof-reading and the article writing-the same as the format of theNational Matriculation English Test II. And in the odd-number module tests (namelyformer midterm examinations), due to the comparatively limit of shorter90-minutetime allocation and100-point total score, the30-piont listening, the testing items of10-point proof-reading, and10-piont of the reading part are taken off to be left for thefinal-term recovery. Whatever the situation might be, we believe that writing the testscientifically and professionally is crucial. And the organization of test developmentcan be roughly divided into three stages: design, operationalization and collation. Inthe design stage, we must describe in detail the characteristics of test takers andobserve the testing specifications issued by the relative school office. In the secondstage, we should select and specify from the available resources and plan for theirallocation and management, and in the last but not least stage, we should employ astrict and complete procedure of at least three peer allocation and verification toguarantee the quality of the testing papers, to make it a completion of possibly mostvalid, reliable and having proper facility value, what’s more, surely free of mistakes, slippery and inaccuracy from the carelessness of the paper constructors.This thesis can be mainly divided into seven parts, the introduction and theconclusion parts, Chapter one deals with the past and present outline of thedevelopment of languade tests, Chapter Two is the concerns of language testingclassification and quality evaluation domain, Chapter Three discusses the norms andrules on the constuction of language testing items, Chapter Four focuses on the presentsituations of the achievement testing practice of Lhasa Senior Middle School and thefifth Chapter gives some personal suggestions on the possible solution to the problemsupsetting some Tibetan Senior Middle Schools on the design of achievement tests. |