Font Size: a A A

Research On Readability Of English Text And Development Of IRMS System Based On Information Computation

Posted on:2008-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:F K XingFull Text:PDF
GTID:2155360215476673Subject:Education Technology
Abstract/Summary:PDF Full Text Request
Readability has been studied for a long time. The achievements of readability studies have been playing an increasingly important role in many fields such as education, military, publishing etc. Anyway readability studies have also been seriously criticized for the inaccuracy of measuring results, the limited measuring ranges and the absence of some important measuring factors. In this article, the author introduces the information theory, which is the framework of this research, into readability study and he also makes a systematic review of previous studies on readability. In the framework of information computation, the author concludes the key factors related to readability measuring and creates new readability models on the basis of training data. He also develops an automatic measuring system called IRMS (Information-based Readability Measuring System), which is developed in the environment of VB.Net 2005 with the technology of relational database.Due to the new framework of this study on readability, the author has overcome many weak points of previous researches and solves some difficult problems, which cannot be solved by the former readability researchers. These include:(1) The variables of readability model in this research are word entropy, sentence entropy and text information instead of word length and sentence length, which are commonly used in the former researches. An n-gram (n=1, 2, 3) language model is made in order to quantitatively measure those information variables such as word entropy, sentence entropy and text information. With the help of n-gram language model, the author avoids treating words as isolated ones or neglecting the linear relations between words, which is the main problem of traditional methodologies. The author can also make a more precise description of language than before with the n-gram language model.(2) In order to take the factor of reading environment into account, the author classifies the reading environments into two classes, one is controlled environment and the other one is uncontrolled environment, according to whether reading time is limited or not. By this classification, the reading environment can be treated as a factor of readability, and it can be controlled by measuring text information. In this way he solves the problem of absence of reading environment in the previous studies.(3) Since different corpora can reflect the characteristics of different languages, the language models, which are built on the different corpora, can also reflect the characteristics of different reader groups. In this way the author solves the problem of absence of reader factor in the previous studies.The framework of this research:1. Introduction: In this part the author briefly introduces the main content and significance of readability study and outlines the framework of this research.2. Literature Review: In this part the author systematically summarizes the significance of readability studies and reviews the previous studies of readability at home and abroad. And then he puts much emphasis on the analysis of the problems in the traditional researches and points out that the defects of traditional research methodology are the essential causes for those problems. Moreover, he makes a detailed study on the process of reading from the perspective of cognitive theory and communication theory. Then he draws the conclusion that the essential factor influencing the text readability is the information transferred by the text rather than the text itself.3. Information Theory and Computation of Natural Language Information: In this part the author briefly defines the concept of information and explains the method of measuring it. And then he makes a detailed definition of the information variables of natural language such as text information, word entropy and sentence entropy. He also introduces the methods of measuring these variables in detail.4. Hypothesis and Methodology: The author puts forward four hypotheses about the relations between readability and some related factors such as semantic factor, syntactic factor, reading environments and readers. And then he dwells on the research procedure and methodology, which include building language model, selecting training data and test data, testing hypotheses, building measuring model, conducting contrastive analysis and so on. 5. Report of Experiment Results: The author tests the four hypotheses and obtains the related variables of readability. Measuring models are built with the least squares algorithm. And a goodness of fit of linear models test and a significance-of-regression test (F test) are done after the building of measuring models. In order to test the superiority of the new model, the author also makes a contrastive study with the ARI (Automated Readability Index) formula, which is the representive formula of the traditional ones.6. Discussion: In this part the author analyzes the strong points as well as weak points of this research and makes some recommendations about the future work.7. Introduction of IRMS: In this part the author mainly introduces the installation method, user interface and operation method of the IRMS system.8. Summary and Planning: The author mainly summarizes the creative points and values of this research and makes a brief plan for the future work.The readability model of this research has been tested on many test texts and the results are quite satisfactory. The test results show that the model created in this research has many merits such as higher accuracy, broader measuring range and more comprehensive measuring factors than before. This study is a breakthrough in the readability studies and will be a great help to improve instructional process, to integrate instructional resources and to increase instructional effectiveness and efficiency. Meanwhile, this study can also enhance the readability study in other languages except English because this study has provided an important methodology, which can be applied by other languages study.
Keywords/Search Tags:Readability, Information Theory, Entropy, Language Model, Corpus, IRMS
PDF Full Text Request
Related items