| During the past few decades, with the development of computer and other reiated subjects, the speeeh synthesis technique progressed a lot. TTS is a technique that ean convert the input text to speeeh output. generally speaking, a TTS system consists of three modules, including text analysis, prosody processing, speeeh synthesis.However, the three modules are not independent. The quality of output speeeh is impactedg reatly by every single module.We can evaluate the output speech in many aspects, but mainly in the output speech intelligibility and naturalness. At present, the intelligibility of TTS has reached a high level, but the naturalness still needs to be improved. There are for areas in prosodic treatment research: prosody prediction, prosody rules, prosody description and prosody modeling. This paper mainly studied the prosodic structure prediction; hope to improve the module to improve the naturalness of synthesized speech.There are close relaition between prosody predictions a text analysis. It is far from sufficient to determine the pronunciation from the text, because the importation of TTS systems is unlimited text. In order to improve the naturalness of speech, it is necessary to extract more prosody information from the text, including the prosodic structure, accent and intonation information. Studies have shown that the prosodic structure can significantly improve the quality of synthesized speech, especially the naturalness of synthesized speech. This paper focuses on how to improve the prosodic structure prediction.This paper analyzed the relationship amony the Chinese prosodic features, pause, accent, as well as the rprosodic boundary, analyzed and compared the Chinese Prosodic hierarchy, while the acoustic characteristics of prosodic boundary. The paper reviewd and compared the traditional Prosodic structure prediction methods, pointed out that the the advantages and disadvantages of traditional prosodic structure prediction methods, and then focused on statistical machine learning based prosodic structure prediction, especially CRF and ME model.In the study of CRFs based prosodic structure prediction system, the paper described the CRFs definition and parameter estimation. And this paper focused on the feature template of CRFs, discussed the selection of the feature window and combined features.In the study of Maximum entropy-based prosodic structure prediction system, this article described the ME definition and parameter estimation. Then it focused on the feature template of maximum entropy model, and discussed the selection of feature window and dynamic features. In addition, this paper, came up with maximum entropy based multi-pass prosodic structure prediction system, and compared with the CRFs-based prediction system. In the prosodic phrase prediction, the former's performance is better than the latter. |