Font Size: a A A

Research On Word Distribution Of General Words And Relations Among Characters Words And Phrases Based On Dynamic Circulating Corpus

Posted on:2008-06-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J HanFull Text:PDF
GTID:1115360215981080Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Words and Chinese characters are very important parts of teaching Chinese as a foreign language. To choose really useful words for student to communicate and helpful words for students to study and remember new vocabulary is hard to implement. Hence, qualitative and quantitative study on Chinese characters, words, and phrases is necessary. The main job of this article is to do research on Chinese characters, words, and phrases separately, together with the research on relations among them. The materials are drawn from 10,000 general words which are calculated from DCC. The author hopes that this research can provide macro reference for teaching Chinese as a foreign language after comparing with current many kinds of Chinese characters lists and words lists.Chinese characters function as calligraphy system of linguistic units. Meanwhile, they should represent usage of linguistic units to some degree. In this article, a new concept"character usage"is put forward, which means that general usage of single syllable or multi-syllable words recorded by Chinese characters can indicated the coverage of Chinese characters. The new feature of Chinese characters can work as an index for Chinese character sequence.Nowadays, experts in the field of information processing want to break through the bottleneck of POS precision with the help of clear relation between characters and words. In some way, the result in the paper can bring more reference for machine learning.Main works in the article:(1)According to the parameter of general usage, the author gives 5 levels of 10,000 general words drawn from DCC, and compares them with frequency list of words produced from mainstream newspaper in 2005 and with HSK vocabulary outline.(2) Characters list of general words is produced from general words list, which are 2249 characters. Database of graded characters of general words is built. Meanwhile, comparison with HSK vocabulary outline, common characters list of modern Chinese, and general characters list of modern Chinese is implemented. On the basis of such research, character usage is put forward.(3) The relationship database of general characters and words, which has 18,798 records, is used as a platform to observe the relationship between characters and words. In this platform, single characters and linguistic units are studied in the way of correspondence. (4) Four more sub-databases are drawn from the relationship database of general characters and words. They are database of multi-POS words, grading database of word-forming capacity of single-character, database of location statistic of single-character, database of attribute statistic of single-character.Something innovative in this article:(1)This research gives a clear concept of characters of general words and phrases, and does deeper study on the newspaper corpus that is composed of 1,113,300,000 words and spanning 5 years(from 2002 to 2005). While being different from former studies, this research pays attention to words in order to analyze characters. Words are start line and end point to characters research. Moreover, some conclusions on their relation are drawn, which can provide more macro reference for teaching Chinese as a foreign language and for machine learning.(2) Character usage is put forward in the first time as a feature to measure the real application of Chinese characters. The index integrates practicability with usage together. In this way, attribute of Chinese characters usage can be measured more scientifically. In one word, it is a better progress in quantitative measurement of Chinese characters.(3) The dynamic databases, the database of graded characters of general words and the relationship database of general characters and words, can be used directly for outline revision and textbook compilation in the field of teaching Chinese as a foreign language. Such databases can be supplementary resources for dynamic renovation of teaching Chinese as a foreign language.(4) The method used in mainstream newspaper can be applied into other medias such as broadcast, TV, and network.Although corpus being studied in this paper is massive and lasts synchronically five years, the progress based on it is just a tree in the forest. When the field of corpus expands, the result can be measured further. Hence, the following tasks should be to trace the change of Chinese characters, words, and phrases with the help of more corpuses, implement both synchronic and lasted comparison, and realize dynamic research of databases. In this way, the result produced in the future can be more representative.
Keywords/Search Tags:DCC, relations among Chinese characters, words, and phrases, Chinese character usage, Teaching Chinese as a foreign language
PDF Full Text Request
Related items