Font Size: a A A

The Research On Automated Essay Scoring Method

Posted on:2016-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:S M NiFull Text:PDF
GTID:2348330503977883Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In English study, the subjective test is an effective method to measure student’s language performance abilities, and the same time it has been applied to all kinds of English tests. With the development of the modern science and technology, computer technology is becoming more and more intelligentized. Research and development of automated essay scoring system in a reflection of this trend.Automated Essay Scoring (AES) uses computer technology to identify, analyze and score essay. It can overcome the disadvantages of many artificial essay scoring method. AES works quickly and can save a lot of manpower and capital, besides, it is objective and can ensure the fairness. In addition, AES system can count grammar errors, spelling mistakes etc. providing data to help enhance teachers’ teaching and give students guidance to improve their writing.In this paper, we extract multiple attributes from two aspects of an essay, language and content, using the improved classification prediction algorithm to score an assay.Use Latent Semantic Analysis (LSA) method to measure the content of an assay. LSA is a theory and method for extracting and representing the contextual-usage meaning of words. LSA concerns more about the constitution of context in semantic range, rather than focuses on the appearance of words in a sentence. This analysis method is more close to the way of human’s reading and understanding. Its key idea is put both text vector and word vector into a low dimension space, by singular value decomposition (SVD), it makes correlate compositions have the approximate vector representation even without so many same words. By LSA we can obtain the internal correlation degree of an essay. Use the chi-square test and other statistical methods to calculate the characteristics of the content which can represent the context of the essay, then extract these words as feature items of the essay. Based on the experimental results we can see that:LSA method with extracting feature words has a slightly advantages than LSA method without feature words in accuracy.In the aspects of expressions in writing, the paper drawn feature items from three aspects: vocabulary, grammar and sentence structure to measure the language quality. With the use of natural language processing technology, the essay score can be more reasonable. The diversity of syntactic structure can be obtained by using Stanford Parser, which can not only generate sentence syntax tree, but also identify the phrase elements of a sentence, such as the subject, predicate, clauses, etc. From the syntax tree, different phrase elements can be counted, which is a part of the feature items of the essay. The syntax errors can be obtained, by using an incorrect syntax pattern matching method. First we defines some rules of common English grammar errors in an XML file, then match the rules with the sentence which has been identified phrase elements by Stanford Parser, and then gain the feature item which can represent the syntax errors of the essay. All these feature items together form the feature vector of the essay.K nearest neighbor (KNN) algorithm exist some shortcomings, so some improvements are made in this paper. Use information gain as a weight function, and give higher weight to the nearest point.Based on the experimental results we can see that:based on information gain weighted and distance weighted KNN combined with feature selected LSA has minimal error rates compared with teacher’s scoring. Compared with unimproved KNN combined with LSA, the error rates reduced from 4.51 to 2.85, showing the best results.
Keywords/Search Tags:Information gain, Latent Semantic Analysis, K nearest neighbor, WordNet
PDF Full Text Request
Related items