Font Size: a A A

Key Technologies For Text-to-speech Systems

Posted on:2014-01-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:1228330452953574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
To improve the naturalness of synthesized speech is the main problem for Text-to-Speech (TTS) systems now, and the prediction of prosodic structure and the generation ofintonation are the key technologies to solve the problem. This dissertation focuses on theresearch on prosody structure prediction and acoustic parameter generation for naturalprosody, and proposes a parameterized representation for intonation and an intonationcontrol model. The main work in the dissertation is:1. Proposes a prosodic structure prediction algorithm based on decision tree. Ona large-scale text corpus manually labeled with prosodic structure, the relation betweengrammar,semanticswithprosodicstructureisanalyzed. Theproposeddecisiontreebasedprosodic prediction algorithm is designed using a selected feature set considering gram-mar and phonology. Experiment result shows that the proposed method can achieve goodprediction accuracy.2. Proposes a parameterized intonation representation method using base tone anddeclinationindexastheparameters. Basedonthestatisticalanalysisofsentencepitchandpitch variation on a large-scale declarative sentence corpus, the base pitch and declinationindex are defined as parameters to describe sentence intonation. A parameterized intona-tion representation based on these parameters is proposed, and the parameters are trainedon the large-scale declarative sentence corpus. A tone normalization index is calculatedto normalize the pitch of different lexical tones, which enables the intonation contour tobe generated on any sentence in the corpus independent of the tone distribution in thesentence.3. Proposes an intonation control method for parameterized speech synthesis sys-tems. With the proposed parameterized intonation representation, average intonation pa-rameters are trained on the speech corpus used in the synthesis system. According to therequirementsoftheTTSsystem, intonationparametersaregeneratedandtheacousticfea-turesareupdatedtosynthesizevariousdeclinationintonations. Adifferencemodelforthedifference of interrogative intonation and declarative intonation is proposed. MSDHMMis used to train the pitch difference contour, and MSDHMM parameters are clustered withcontext information. Interrogative sentence synthesis is realized utilizing the model in anHMM based speech synthesis system. 4. Based on a physiological articulatory organ simulator, the impact of the phys-iological articulatory control on acoustic parameters is analyzed using an analysis-by-synthesis method. The acoustic parameters for cold anger and hot anger speech are ex-tracted, and analysis shows that the two anger emotions have different control mechanismin high frequency region of speech spectrum. Based on the physiological articulatory or-gan simulator, parameters of the vocal tract and voice source are controlled to simulatethe above speech sound, and the results show that the vocal tract and voice source willaffect the speech spectrum differently.5. An HMM based speech synthesis system with the ability of intonation control isconstructed based on the proposed intonation control model. The control of declarativeintonation and the synthesis of interrogative intonation are both realized in the system.
Keywords/Search Tags:prosody, intonation, HMM based speech synthesis, base pitch, declination
PDF Full Text Request
Related items