Font Size: a A A

A Research On The Unit Of Chinese Idioms: Based On The Dynamic Circulating Corpus

Posted on:2006-08-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J G YangFull Text:PDF
GTID:1115360152988963Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Based on the phenomenon of emphasizing "words" but neglecting "idioms" in lexicography and Chinese information processing, this paper reflects upon the research of phrases and idioms, and puts forward the concept of Idiom Unit(IU). IU functions as the structural unit of the idioms, conforms to the cognitive rules. It is a fixed expression often used as a single word. Three principles to identify IU are: whether it is stable in usage and compact in combination; whether it conforms to people' s cognitive rules(the normal length of IU is 7+2); whether the degree of circulation reaches a certain setting value. Theoretically, IU includes all the idiom-like word combinations. In this paper, the IU in question includes 1) idioms of three-character words as well as structural units such as " 差不多,靠不住,来不及 " which lie between words and phrases; 2) idioms and new fixed phrases of four-character words; 3) abbreviations, lettered words and phrases and so on. This research is based on the texts from People' s Daily (year 2001-2003), with a total number of about 80 million characters. Under the guidance of Dynamic Language Knowledge Renewing Theory, and on the basis of Degree of Circulation Theory, the annual average setting value of degree of circulation (0.5) is adopted as the main criterion; the primary selection of IU by combining rules and statistics is conducted, and the quantitative and qualitative analysis on the noise environment of some selected IU is carried out. The strategies and basic steps in selecting IU are as follows:1. to shorten the texts by using punctuations and high-frequency words like "的, 是,在,和,了,有" ; character strings with cuttings are crossed out, but will be made up otherwise.2. to transform data format; to transform the "complete" 2-8 character strings obtained into database format.3. to calculate the frequency, degree of distribution and circulation of 3-5 character strings.4. to conduct selection by using the annual average setting value of circulation degree of the character strings.5. to cut the words containing no less than 5 syllables and tag their parts of speech; to select 3 and 4-character strings and adjacent character strings (bi-grams) which conform to grammatical combinational rules such as "N+N" , "N+V" , "V+V" and so on, then repeat step 3 and 4 on the selected character strings.6. to conduct noise elimination on the selected character strings, and re-cut them and re-tag their parts of speech, then filter them by using static rule templates(30 rules in all).7. to select IU directly with the help of supplemental approaches.8. to obtain a 3 to 5-character IU table.This paper also classifies and demonstratively analyzes part of the selected 3 and 4-character strings. The emphasis is laid on the IU~like phrases. As for the3-character strings, those with a syllabic pattern of "1+2" and a structural pattern of "V+N/NP" and those with a syllabic pattern of "2+1" and a structural pattern of "V/VP+ N" are the main focus of discussion. The claim of Feng Shengli concerning about 3-syllable combination is confirmed: those with a syllabic pattern of "1+2" are phrases, and those with a syllabic pattern of "2+1" are prosodic words.As for the 4-charcter strings, "N+V" and "V+N" are the main focus. Between N and V there are complicated grammatical, semantic and syllabic constraints. In "N+V" pattern, it is observed that 4-character attributive + head idioms bear the strongest likeness with IU; followed by adverbial + head relation; and subject + predicate relation comes the last with a certain discreteness between N and V. In"V+N" form, it is observed that: 1) if the 4-character strings in "V+N" form indicate general names, they are often or prone to be Noun Phrases: 2) if the nouns in the 4-character strings in "V+N" form are abstract 2-syllable nouns, then the 4-character Noun Phrases bear a relatively stronger likeness with IU. 3) if the verbs of the 4-character strings in "V+N" pattern are two-syllable Verb+ Object verbs, then they bear a strong likeness with IU.Researches are als...
Keywords/Search Tags:idiom unit, the Dynamic Circulating Corpus, idiom, degree of circulation, fixed phrase
PDF Full Text Request
Related items