Key Technologies For Text-to-speech Systems

Posted on:2014-01-01

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y X Wang

Full Text:PDF

GTID:1228330452953574

Subject:Computer Science and Technology

Abstract/Summary:

To improve the naturalness of synthesized speech is the main problem for Text-to-Speech (TTS) systems now, and the prediction of prosodic structure and the generation ofintonation are the key technologies to solve the problem. This dissertation focuses on theresearch on prosody structure prediction and acoustic parameter generation for naturalprosody, and proposes a parameterized representation for intonation and an intonationcontrol model. The main work in the dissertation is:1. Proposes a prosodic structure prediction algorithm based on decision tree. Ona large-scale text corpus manually labeled with prosodic structure, the relation betweengrammar,semanticswithprosodicstructureisanalyzed. Theproposeddecisiontreebasedprosodic prediction algorithm is designed using a selected feature set considering gram-mar and phonology. Experiment result shows that the proposed method can achieve goodprediction accuracy.2. Proposes a parameterized intonation representation method using base tone anddeclinationindexastheparameters. Basedonthestatisticalanalysisofsentencepitchandpitch variation on a large-scale declarative sentence corpus, the base pitch and declinationindex are defined as parameters to describe sentence intonation. A parameterized intona-tion representation based on these parameters is proposed, and the parameters are trainedon the large-scale declarative sentence corpus. A tone normalization index is calculatedto normalize the pitch of different lexical tones, which enables the intonation contour tobe generated on any sentence in the corpus independent of the tone distribution in thesentence.3. Proposes an intonation control method for parameterized speech synthesis sys-tems. With the proposed parameterized intonation representation, average intonation pa-rameters are trained on the speech corpus used in the synthesis system. According to therequirementsoftheTTSsystem, intonationparametersaregeneratedandtheacousticfea-turesareupdatedtosynthesizevariousdeclinationintonations. Adifferencemodelforthedifference of interrogative intonation and declarative intonation is proposed. MSDHMMis used to train the pitch difference contour, and MSDHMM parameters are clustered withcontext information. Interrogative sentence synthesis is realized utilizing the model in anHMM based speech synthesis system. 4. Based on a physiological articulatory organ simulator, the impact of the phys-iological articulatory control on acoustic parameters is analyzed using an analysis-by-synthesis method. The acoustic parameters for cold anger and hot anger speech are ex-tracted, and analysis shows that the two anger emotions have different control mechanismin high frequency region of speech spectrum. Based on the physiological articulatory or-gan simulator, parameters of the vocal tract and voice source are controlled to simulatethe above speech sound, and the results show that the vocal tract and voice source willaffect the speech spectrum differently.5. An HMM based speech synthesis system with the ability of intonation control isconstructed based on the proposed intonation control model. The control of declarativeintonation and the synthesis of interrogative intonation are both realized in the system.

Keywords/Search Tags:

prosody, intonation, HMM based speech synthesis, base pitch, declination

Related items

1	The Research Of Speech Synthesis And Prosody Control In Wu-Dialect Text-to-Speech
2	An Improved Speech Synthesis Method
3	Research On Chinese Speech Synthesis Based On Pitch Synchronization Superposition Method
4	Research On 3D Visible Speech Animation Driven By Prosody Text
5	The relationship between pitch discrimination skills and speech prosody decoding skills
6	Expressive Text-to-speech System On Mandarin
7	The Study Of Pitch Shifting Algorithms And The Application In Speech Synthesis
8	The Research On Dai Prosody Prediction Module Of Speech Synthesis
9	A Method Of Speech Synthesis Based On Exciting VTFR In Chinese
10	Pitch Detection Algorithm And Its Application In Speech Synthesis