| Chinese develops kinds of dialect languages with its vast landmass and large population. With the development of speech synthesis technology, more and more researches focus on dialect speech synthesis. Compared with Mandarin, Chinese dialects not only have different pronunciation and tone on syllable, some vocabularies are also different. The thesis select Xingtai dialect as the research object and focuses on synthesizing Xingtai dialect to reflect this characteristics. The thesis compares the pronunciation difference between Xingtai dialect and Mandarin. A text analysis method is proposed for mixed language of Xingtai dialect and Mandarin, which can convert input Chinese text into speech synthesis oriented context-dependent labels to reflect the special pronunciation special vocabularies of Xingtai dialect. A hidden Markov model-based statistical parametric speech synthesis, which employs speaker adaptive training and speaker adaptation transformation to obtain Mandarin or Xingtai dialect acoustic models, is used to synthesize Mandarin or Xingtai dialect speech. The main works and originalities of the thesis are as follows:Firstly, the thesis designed a set of machine-readable phonetic scheme named SAMPA-XT and a special vocabulary conversion dictionary that for Mandarin/Xingtai dialect speech synthesis. The SAMPA-XT, which is obtained by comparing the differences of finals, tones on pronunciation between Xingtai dialect and Mandarin, is used to convert input Chinese text into pronunciation label. At the same time, a special vocabulary conversion dictionary is build based on the special vocabularies of Xingtai to convert the Chinese words to Xingtai words.Secondly, the thesis build a Xingtai dialect/Mandarin bilingual speech corpus for bilingual speech. 300 text sentences are designed based on the pronunciation characteristics of Xingtai dialect that covers all initials, finals, tones and special vocabularies of Xingtai dialect.The Mandarin/Xingtai dialect paraleled speech corpus is also recorded in a studio.Thirdly, a text analysis for Xingtai dialect is realized to convert input Chinese text into the context-dependent labels for statistical parametric speech. The input Chinese text is firstly converted to Mandarin pinyin. Then the pinyin sequence is converted into SAMPA-XT based label according to the rules of Xingtai dialect pronunciation and the special vocabulary conversion dictionary. The context information including syllable, prosodic structure and sentence information are then used to generate a context-dependent label for speech synthesis.Finally, the thesis realizes a Mandarin/ Xingtai dialect cross-lingual speech synthesis. Mandarin acoustic models or Xingtai dialect acoustic models are obtained with multi-speaker’s Mandarin training corpus and a small amount of Xingtai dialect training corpus with speaker adaptive training and speaker adaptation transformation to synthesize Mandarin or Xingtai dialect speech. Subjective tests show that proposed method can synthesize high quality Xingtai dialect even with a small amount of Xingtai training set compared with traditional methods. |