Font Size: a A A

Chinese Speech Synthesis System Improvements And Implementation

Posted on:2013-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z H JiaFull Text:PDF
GTID:2218330371960235Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
The trend can be felt obviously that voice is becoming more and more important in human-computer interaction. Speech recognition and speech synthesis are two hands of voice interaction. Usually, speech synthesis is known as TTS (text-to-speech) which transforms text to speech by rules made in advance.Generally there are three modules in a TTS, known as text processing module, prosody processing module and speech processing module. Text processing module is the front-end of a TTS, and the major jobs of this module are extracting words from texts, formatting the non-standard words and turning words into phonemes. Information about prosody can be gotten in prosody processing module by extracting prosodic structure and marking accent and intonation. Speech processing module is the back-end of a TTS, and voices will be made, modified and output in this module.To improve the intelligibility and naturalness of the Chinese speech synthesized by TTS, in this paper the major studies have been focused on text processing module and prosody processing module. Studies done in this paper are listed as following.(1) The overall frame of a common TTS was analysed in order to learn the function and the working principle of each module in a TTS. Given the importance of PSOLA during the process of speech processing, the category and the implementation procedure of PSOLA have been studied as an important point, and PSOLA was applied in an orginal text-to-speech system.(2) The text processing module was studied and the method to remove the difficulties made by polyphones during the process of turning Chinese characters into phonemes has been improved. In this paper two methods were implemented, one based on matching the word containing the polyphone with words in thesaurus was called static method, while the other one based on the POS of the polyphone and words around it was called dynamic method which was implemented by decision tree C4.5.(3) The prosody processing module was studied and the method to predict the prosodic structure was improved. In the improved method HTK was used to train some marked sequences of POS and word length, and some HMMs needed were gotten from the training. Then the HMMs mentioned above would be used to predict the prosodic structure. During the process of training, Good Turing was used to smooth the parameters of HMMs.(4) An original text-to-speech was built and tested, the datum collected from the experiments showed that the research of this paper really helped improve the quality of the voice synthesized by TTS on intelligibility and naturalness.
Keywords/Search Tags:Polyphone, Prosodic Structure Prediction, Decision Tree, Hidden Markov Model
PDF Full Text Request
Related items