Font Size: a A A

Research Of Chinese Emotional Speech Synthesis Based On HMM

Posted on:2015-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2308330473456981Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech is the most direct and effective way of human communications. With the development of intelligent computer and affective computing, demand for speech processing technologies is growing. This paper proposed a speech synthesis method based on HMM for diverse speech, which achieved an automatic training and constructing system. Based on this, we studied the emotional speech classification and emotional characteristic analysis by analyzing the frequency, duration, energy and context of emotional speech, summarized the relationship between emotional speech and neutral sentence. In order to synthesize high-quality emotional sentences, the dissertation particularly introduce PAD three-dimensional emotion model, which can extend emotional speech to computable and quantized speech. Finally, we synthesize the target emotional speech through a speech synthesizer.In this paper, PAD emotional state model is used to analyze emotional characteristics of speech, which provide a theoretical basis for the emotional speech processing research in the future. Using Boosting-GMM algorithm to predict modeling, our research has laid a good experimental and analyzed foundation for the subsequent emotional speech conversion. The main research works and results are as follows:1. Our research proposed a speech synthesis method based on statistical model and established an integrated and trainable framework for a speech synthesis system. The system use input voice data to conduct acoustic parameters modeling, and thus, we can construct the corresponding synthesis system by statistical model obtained from training. This can meet the current demand for high expressive and diverse speech synthesis.2. We use PAD emotional state model to quantitatively analyze emotional feature parameters, drawing out the mapping relationship between different emotional states and the three dimensions of PAD. The proposed method can provide a theoretical foundation for follow-up target emotional speech synthesis and improve the quality of synthesized speech.3. This paper uses Boosting-GMM algorithm to predict emotion modeling. We establish four kinds of weak predictive models for four goals emotions. Each weak predictive model consists of a basic prediction model and other assistant prediction models. We compared emotional acoustic feature prediction models based on GMM and Boosting-GMM. Because Boosting-GMM implements a re-sampling process, which increases the proportion of the samples with large prediction errors in training set. Therefore, Boosting-GMM performs better than GMM. Finally, we synthesis the target emotional speech by parameters obtained from the prediction model and STRAIGHT algorithm. Experimental results show that the method achieve a good voice quality and naturalness.
Keywords/Search Tags:Speech Synthesis, HMM Model, PAD Emotional state model, Boosting-GMM algorithm, Emotional Speech synthesis
PDF Full Text Request
Related items