Font Size: a A A

Text Readability Evaluation Of Chinese As A Foreign Language Using Multi-Dimensional Features And Random Forest

Posted on:2020-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:W D YangFull Text:PDF
GTID:2415330578973898Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of international promotion on Chinese language,the number of people who learn Chinese as a second language has increased day by day,and the research on it has arisen as well.The text readability evaluation on Chinese as a foreign language is an important topic of the research.Readability refers to the degree or nature of the text with regard to how easy it is.It is important to provide second language learners with reading texts with readability matching their levels of mastery on the language.Difficult reading texts will make learners stuck and hard-pressed.Meanwhile,too easy reading texts will make learners lose interest in reading and fail to learn new language knowledge to improve their reading levels.However,evaluating Chinese text readability for foreign learners manually is time-consuming and laborious for both teachers and students,and often has strong subjectivity.This paper focuses on this issue,and makes contributions as follows:(1)It presents an extensive review on the development,status and research results of related works.Firstly,it summarizes the three stages of foreign readability research,named readability formula,readability research based on cognitive theory and readability analysis based on machine learning.Then it summarizes two stages of the research about text readability of Chinese as a foreign language:the readability formula based on traditional text features,and the text readability assessment based on machine learning.(2)A method named "text readability evaluation of Chinese as a foreign language based on multi-dimensional features and random forest"(referred to as Multi-D RF-CFLE thereafter)is proposed.Random forest is an ensemble-learning algorithm based on decision tree.Due to its simple structure and strong generalization ability,it is widely used in various fields and has outstanding performance in data mining competitions.Therefore,this paper attempts to use the random forest algorithm to assess the text readability of Chinese as a foreign language.Firstly,referring to the domestic and foreign text readability evaluation indicators of second language,a total of 86 features are extracted from the four dimensions.These four dimensions are basic features,part-of-speech features,hierarchical features and grammatical features.After feature selection,the random forest algorithm is used to train the classifier on the training set.And the validity of the method was evaluated on the test set.(3)Comparative experiments between the proposed algorithm and the most popular SVM algorithm are presented(referred to as Multi-D SVM-CFLE thereafter).The main differences between the two experiments lie in feature selection and classifier training modules.Multi-D RF-CFLE experiment uses filter feature selection method,and the classifier of this experiment is trained by random forest algorithm;Multi-D SVM-CFLE experiment uses both filter and wrapper feature selection method,and the classifier of this experiment is trained by support vector machine algorithm.After this set of comparative experiments is completed,we analyze the experimental results and draw the conclusions as follows:The multi-D RF-CFLE experiment is superior to the Multi-D SVM-CFLE experiment in the evaluation indexes of single classifications as well as the accuracy and adjacent accuracy.
Keywords/Search Tags:Chinese as a foreign language, readability, multi-dimensional features, feature selection, support vector machine, random forest
PDF Full Text Request
Related items