In this study we made an analysis of the prosody of Chinese spontaneous speech. The corpus is about telephone conversations for hotel reservation, on which syllables, initials and finals, break index, stress, and sentence type are manually annotated by trained annotators.As in English, the prosodic structure of Chinese utterance is also hierarchical. According to the characteristics of the corpus, we differentiated three levels of prosodic unit, i.e. intonational phrase (IP), intermediate phrase and prosodic word (PW). We annotated these units through perception first, and made analysis based on the annotation. The grouping of words into prosodic constituents is affected by prosodic, syntactic, semantic and pragmatic constraints. The optimal length of a PW is two or three syllables and prosodic structure tends to be balanced. A monosyllabic word will join into its adjacent monosyllabic or disyllabic word to form a PW if they are immediate constituents syntactically, and a function word must join to a content word. If two short words naturally form a meaning unit when joined together, they tend to form a PW. A focused word fends to start a new prosodic constituent.The pitch ranges and registers of different IPs vary greatly, which is affected by the length of the IP, its place in the discourse, and the information it conveyed. The pitch range of an IP tends; to be great if the IP is long, and its pitch register tends to be high if a new topic starts. The more informative an IP is, the greater and higher the pitch range and register will be. As in English, there is pitch declination and final lowering in Chinese utterance, and the declination rate is about 3 st/second. If the focus is at the end of the utterance, the declination rate will be small. And if the utterance is front-focused, the declination rate will be great. Compared to old information, the pitch range of new information will be greater. Within an utterance, the pitch range and register of a pronoun tend to be smaller and lower.The duration of neutral tone is smaller than the average (about 0.67 as much as the full tones) and the last syllable in an IP is longer than the average (about 1.34 as much as the average). The average duration per syllable of words at the end of an IP is the greatest, and that of words in the middle of an IP is the smallest. The longer a PW is, the smaller the average duration per syllable is. The duration of a focused word tends to be greater.Chinese is tone language, but the strength of syllables in an utterance is different from each other, i.e. some syllables are more stressed than the others. The prominent syllables which bear focus are pragmatically determined and the comparatively more prominent syllables in rhythmic units are prosodically determined. In an intonation phrase, the first prosodic word tends to be more prominent perceptually, while that at the end is less prominent perceptually. The durations of syllables near the end are quite great, but they are not prominent perceptually, which means that prominence is more correlated to pitch than to duration. |