Font Size: a A A

Statistical Analysis On Word Entropy Of Foreign Students' Compositions In HSK Dynamic Composition Corpus

Posted on:2020-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:C YiFull Text:PDF
GTID:2415330590462909Subject:Chinese international education
Abstract/Summary:PDF Full Text Request
Saussure pointed out that language is a social phenomenon and a symbolic system for expressing ideas.In the process of teaching Chinese as a second language(CSL),we often find that learners with lower levels have a limited number of vocabulary,less vocabulary in writing,while middle-and high-level learners are not restricted by general vocabulary,and often use more vocabulary.The use of symbols in such linguistic text is related to the amount of information,and the calculation of information entropy can be applied to any type of frequency distribution.This thesis is based on the "HSK Dynamic Composition Corpus" of Beijing Language and Culture University.Based on the information attributes of language,combined with the guiding theories and viewpoints of measurement linguistics,information theory and collaborative linguistics,the paper uses the principle and calculation of written Chinese word entropy.The method analyzes the word frequency and word entropy of the corpus of the intermediate words that are segmented and labeled.This study examines the commonalities and differences in the use of vocabulary between the “country” and “genre” of foreign students' compositions from the perspective of entropy,and on this basis,lexical diversity,lexical repetition rate,and lexical uniqueness from the perspective of lexical measurement.And the high-frequency words and other aspects of the use of foreign students' vocabulary vocabulary description.The statistical results show that there are significant differences between the word entropy in the composition of foreign students in different countries and different genres,and some do not differ.Among them,(1)there is no significant difference in the entropy of the lyrics of Japanese and Korean nationalities,and there are significant differences in the entropy of the remaining nationality words;(2)There is no significant difference between the narrative style and the entropy of the essay,and there are significant differences between the literary style and the narrative and the dissertation.It is further found through other lexical measurement indicators that there is a significant difference in the word entropy ofthe text,and there are also differences in the vocabulary use richness of the two texts;there is no significant difference in the text word entropy,and the vocabulary usage of the text is also similar.The significance of this study is that for the first time,from the perspective of information entropy,and from the author's country and writing genre,the second language acquisition corpus statistical analysis.We come to the conclusion,the degree of vocabulary used by this word entropy and the commonality and difference of the word entropy can also be reflected in the process of using the vocabulary by second language learners.That is to say,text research based on word entropy in second language acquisition is feasible,which also provides a new perspective and ideas for the study of Chinese vocabulary acquisition.
Keywords/Search Tags:Information theory, Word entropy, Country, Genre, HSK dynamic composition corpus
PDF Full Text Request
Related items