Font Size: a A A

Research On The Mongolian Speech Synthesis Based On Prosody

Posted on:2013-01-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:M AoFull Text:PDF
GTID:1115330374470716Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
Based on large-scale speech corpus and phonetic experiments, this paper examined Mongolian prosody issue in Mongolian speech synthesis. The paper consists of three parts. The first part is about resources construction of large-scale Mongolian speech synthesis-based speech corpus and E-dictionary. The second part inspects syllabic structure changes in Mongolian discourse such as segment adding, dropping and re-organizing and explores syllabic corresponding relation between spoken and writing Mongolian and the relation between segment adding&dropping and prosodic structure of discourse. The third part studies prosodic structure of reading Mongolian. Through checking basic acoustic parameters such as pitch and duration, the paper made comprehensive inspection on prosodic words and prosodic phrase and proposes that pitch contour plays important role in dividing prosodic phrase. Followings are main conclusions of the paper:A. We proposed a set of Phonetic Transcription Symbols used in Mongolian speech synthesis including50vowels (long, short and compound) in word-initial, median, final positions and27consonants (basic consonants and borrowed consonants), which are described and differentiated in phonetic contrast features. Mongolian speech synthesis results indicate that these descriptions and differentiations are effective and improve understanding level of synthesized speech.B. In discourse, the multiple pronunciations of words differentiate in syntax, grammar and pragmatic. In specific context, however, the pronunciation of polyphone is sole, which can be used to differentiate polyphones. Some polyphones, which have neither meaning contrast, nor grammar and pragmatic features, belong to pronunciation normalization issue and should be integrated.C. In terms of word level, syllabic structures of monosyllabic words of spoken and writing Mongolian are almost same. There are12syllabic structure-changing rules for disyllabic words between spoken and writing Mongolian. In polysyllabic spoken Mongolian words, syllabic structure changes begin from the final syllable to the initial, keeping same syllabic structure changing rules with disyllabic words. In spoken Mongolian, syllabic structure is variable for syllable with short vowel. Syllables with Long vowel and diphthong are stable. Word-initial syllables are stable. Based on these findings, syllables of spoken Mongolian can be divided into stable and variable. In Mongolian synthesis, transcription of words in variable syllables is vital.D. In continuous speech, the primary factors causing syllabic structure change are noun supplements and affixed function words, which cannot constitute independent syllable in spoken Mongolian. For sentence and words, syllabic re-organizing rules are same. When syllabic type of affixed elements is V, C, VLC, consonant of previous syllable constitutes independent syllable. CVL is very stable and can be word-final syllable. Segment dropping and adding of spoken Mongolian, syllabic re-organizing and prosodic structure of discourse are related. Syllabic re-organizing, adding and dropping of segments all happen in prosodic words. Affixed noun elements are useful phonetic clues to predict prosodic word boundary. Action scopes of prosody of four function words are different:for function word "uAE(?)u", it is prosodic phrase; for function words "(?)" and "(?)", it is intonation phrase; for function word "(?)", it is prosodic words in sentence.E. In declarative discourse at normal reading speed, every prosodic phrase has a complete pith contour and a pitch peak. Pitch contour goes up before the pitch peak and goes down after that, forming L-H-L pitch pattern, which begins at initial of prosodic phrase and ends at final of prosodic phrase. Therefore, this paper concludes that when a sentence has neither punctuation mark nor evident pause, prosodic phrase boundary is at the intersection of two pitch contours. Statistics show that previous syllable of prosodic phrase prolongs at some extent. In addition, word-final schwa is also stress cue to predict prosodic phrase boundary.F. Prosodic word boundary has neither evident pause nor lengthening. Within prosodic words, syllabic duration and syllabic position are related:duration of final syllable> duration of initial syllable> duration of medial syllable. Duration of prosodic words at prosodic phrase boundary is a little bit longer than those at the medial of prosodic phrase. Syllabic position at prosodic phrase affects pitch pattern of prosodic words. Based on statistics data, there are four types of prosodic words:1) Grammar words of one-five syllables.2) Two parallel monosyllables grammar words.3) Grammar words of one-four syllables and monosyllable function words.4) Monosyllabic grammar words or function words at prosodic phrase boundary.G. Speech synthesis results show that segmenting cues of prosodic phrase and words can improve naturalness of synthesized speech at some extent. However, due to small size of speech corpus of prosody transcription, improvement of naturalness of synthesized speech is limited. We believe that, with advancing of Mongolian prosody research and more speech corpus with prosodic transcription, it is possible that high quality and high naturalness of synthesized Mongolian speech can be achieved.
Keywords/Search Tags:MONGOLIAN, SPEECH SYNTHESIS, PROSODY, SYLLABLERE-STRUCTURE
PDF Full Text Request
Related items