| During the past few decades, with the development of computer and other related subjects, the speech synthesis technique progressed a lot. Nowadays, speech synthesis technique focuses on Text-To-Speech (TTS). TTS is a technique that can convert the input text into speech output. Generally speaking, a TTS system consists of four modules, including Text Analysis, Prosody Control, Speech Synthesis and Unit Database. However, the four modules are not independent. The quality of output speech is impacted greatly by every single module.The estimation to output speech relates to many aspects, but mainly to definition, understandability and naturalness. The definition and understandability of existing TTS systems are satisfactory now, but the overall naturalness still need to be improved. In this thesis, we research Prosody Control and Speech Synthesis these two modules to improve the output speech naturalness.The Prosody Control module greatly impacts the naturalness of the output speech. There are many research subjects in Prosody Control, but we focus on prosody modeling. Prosody model is used to predict the quantitive acoustics parameters according to the high level qualitative prosody information. We design and implement a predictor, which can predict the pitch contour, duration and pause of Chinese syllable. Experiment result shows that this model is accurate enough to predict these parameters.The speech synthesis module builds the final output speech, and generally adopts the waveform concatenation technique. After the selection of optimal units, it also does some modification to the waveform to make the speech more natural. In this paper, an optimal unit selection algorithm and a Fourier based speech spectral modification algorithm are introduced in detail. This modification algorithm not only smoothes the speech spectrum, but also avoid the problem of synthesized speech quality degrading which is caused by traditional algorithm.To verify the performance of algorithms, a simple TTS system is constructed in this paper, which utilizes all the mentioned algorithms. The listening test indicates that the output speech is more natural than previous system to some extent. |