Font Size: a A A

A Preliminary Study On Automated Scoring System Of Chinese L2 Essays

Posted on:2014-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:D ChenFull Text:PDF
GTID:2295330473959426Subject:Chinese international education
Abstract/Summary:PDF Full Text Request
Various studies in Second Language Acquisition have claimed that large-scale writing assessment is to incorporate advances in technology, and that the move towards automated essay scoring seems inevitable. Several automated essay scoring systems have already been produced and even commercialized for a few years outside of China, while in China, territory of automated Chinese essay scoring has hardly been explored, and it is clearly impractical for China to borrow any of the existing systems from abroad due to the significant difference between these two languages. The growing number of Chinese learners around the world and the increasingly popular HSK test call for an automated Chinese essay scoring system. Such a system, if properly designed on the base of experiments with enough data, can greatly relieve the heavy burden on Chinese essay raters and potentially produce reliable and valid essay scores. Such system can also be modified to operate online as an interactive system for language learners from all around the world.This study is to explore the feasibility of constructing a statistical model of Chinese argumentative essays for large-scale writing assessment, mainly in the measurement of the quality of language. Constructing such a complex model involves not only the theoretical study and conceptual model design, but also the techniques employed throughout the whole process of modeling, extracting effective features from essays in particular.Considering the limitation of research time, accessible resources and professional knowledge in relevant areas like computer technology, as well as financial constraints, this study does not construct the model in the way that all different aspects of writing quality can either be examined individually or as an integrated whole. It just gives an outline of how the model will be constructed and what are the major components, and focus on exploring and modifying detailed effective and feasible measures of language quality of Chinese essays written by Foreign learners.In Chapter 2, theories and concepts in Second Language Acquisition and writing assessment studies have been drawn on to provide a theoretical basis for this study, and then four representative systems are reviewed:Project Essay Grade (PEG), Intelligent Essay Assessor (IEA), Electronic Essay Rater (E-rater) and IntelliMetric are briefly introduced.Those existing systems either judge the quality of essays on the basis of the form (language use) or the content of essays, or take a mixed approach by measuring both. The available literature displays an enormous variety of assessment criteria of EFL writing. This study reviews related literature from three main aspects of essay quality:language quality, content quality and organization quality.In measuring language fluency, four measures (essay length in words,fourth root of essay length,number of word types and number of sentences) are proved to be effective in various studies. In measuring language appropriateness, number of prepositions and articles are often used in predicting the language quality of English essays. Measures of language complexity can be divided into two parts: lexical complexity and syntactic complexity. As for lexical complexity, a variety of studies reported that measures like type-token ratio, number of common words and uncommon words, average word length, standard deviation of word length, and number of nominalizations, are predictive. While in the case of syntactic complexity, average sentence length, percentage of long sentences and short sentences, percentage of simple sentences, readability indices, and T-unit are reported to be effective in different studies. Most existing measures of writing quality found in the literature only focus on measuring the quality of language. So measures of content and organization are much less than measures of language quality. Number of paragraphs, pronouns and connectives, as well as Singular value decomposition (SVD) similarity measures, are widely acknowledged to be predictive.In Literature Review, several domestic studies on automated scoring of Chinese or English essays are also reviewed. Cao Yiwei and Yang Chen (2007) were the first to apply latent semantic analysis technology to automated Chinese essay scoring. Another Chinese AES study is conducted by Leanne (2006), the samples of which are essays in Chinese Proficiency Test Level-3 for Chinese minorities, which studied what indicatorsshould be extracted in Chinesetest essays and got eight regression equations. Zhang Jinjun and Ren Jie also dida research on Chinese automated scoring system, which got a regression equationwith fivevariables.In Chapter 3, after reviewing and comparing of these systems, this study suggests that the automated Chinese essay scoring system shouldemploythe framework of E-rater,and language analysis module in the framework can draw from the methods and techniques of language quality analysis in PEG, while the content analysis module, the weak point of E-rater, can draw on the experience of IEA byemploying latent semantic analysis. At the same time, natural language processing technology should be combined with latent semantic analysisin order to take account of bothlanguage quality measuring and content quality measuring. When China’s artificial intelligencedevelops to a higher level, we can also learn from techniques of IntelliMetric.As for selecting text features, we can learn from English essay scoring researches. On one hand, we can apply some text feature indicators to Chineseessay scoring studies, such as T-unit, the fourth root of the total number of words, etc.,on the other hand, we can also add some unique Chinese text feature indicators.In the first part of Chapter 4, detailed text feature indicators of four aspect of language quality (language fluency, accuracy, complexity and diversity) as well as indicators of content and structure quality are further explored, which will be used in this empirical research.The study makes the recommendation that 128 text feature indicators be extracted and usedin modelling an automated scoring system of Chinese L2 essays, and puts forward some innovative text feature indicators.The second part of Chapter 4 is empirical research, including data process and analysis.128 measures altogether are extracted from 120 HSK essays. The study employs statistic techniques multiple regressions to construct a language module.Multiple regression analysis is performed in two methods (Forward and Stepwise), constructing two statistical model. Then the study compares the validation of the two. Six indicators (fourth root of essay length, ratio of Level 1 characters to type characters, ratio of Level 3&4 characters to type characters, ratio of number of adverbs to the sum of words in essay, ratio of number of nouns to the sum of words in essay, ratio of number of error-free adverbs to the sum of adverbs) enters the Multiple Stepwise regression equation. Apart from five of the six indicators mentioned above, another regression equation includes four more indicators:number of Level 2&3 characters, number of T-unit, and number of nouns and number of clauses. These two regression equations both have high validity, and could be expected to perform well in predicting the quality of Chinese essays. Compared with other regression equations, the ones in the study are high in validity, however, more relevant samples and comprehensive studies are need to validate the model suggested in this study.In the end of the study, several suggestions are made on research directions of Chinese automated scoring studies. This study has some drawbacks because of narrow range samples taken and imperfect means of variables extraction. Notwithstanding, this study is still significant because it brings forth innovative ideas about the what indicators of Chinese essay quality can be put into a statistic-based model for Chinese automated essay scoring systems, and provides new theoretical insights in the territory of Chinese automated essay scoring studies.The research makes a beneficial and courageous attempt in exploring Chinese automated essay scoring system.
Keywords/Search Tags:Automated Scoring, Chinese L2 Essays, Text Indicator, Multiple Regression Analysis
PDF Full Text Request
Related items