Font Size: a A A

Research On Burmese Emotional Speech Synthesis

Posted on:2023-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y LiuFull Text:PDF
GTID:2545306617977009Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence,speech synthesis technology has been widely used,and has a wide range of technical applications in scenarios such as car voice navigation,e-book reading,and AI virtual anchors,which promotes the further development of the intelligent voice industry.In recent years,speech synthesis technology has developed rapidly,and the naturalness and intelligibility of synthesized speech have been greatly improved.With the in-depth study of speech synthesis technology,fruitful research results have been achieved in robust speech synthesis,mixed-language speech synthesis,multi-speaker speech synthesis,and expressive emotional speech synthesis.Emotional speech synthesis makes machine expressions more human and can better facilitate human-computer interaction.Compared with common languages such as Chinese and English,Burmese speech synthesis research lags behind,and the synthesized speech lacks emotion.This paper studies Burmese emotional speech synthesis under low-resource conditions,and explores fine-grained control and adjustment methods for emotional speech to enrich the emotional expression of synthesized speech.The main work of this paper includes:(1)As the basis of emotional speech synthesis,this paper constructs a small-scale Burmese emotional speech database,including four emotion types: calm,happy,sad,and angry.Analyze the acoustic feature parameters of speech with different emotions,summarize the performance rules of acoustic feature parameters under different emotional states,and provide a reference and basis for the subsequent adjustment of speech emotional states.(2)Using the medium-scale Burmese calm speech database,the Burmese speech synthesis system is trained by the method based on HMM(Hidden Markov Model).On this basis,using a small-scale emotional speech database and the method of adaptive training of acoustic models,a Burmese emotional speech synthesis baseline system capable of synthesizing 4 emotion types is constructed,and an average sound model is introduced to improve the synthesis of happy,sad,and angry.The quality of emotional speech.(3)Adopt the speech synthesis method based on HMM-DNN,use the medium-scale calm speech database,the small-scale emotional speech data and the adaptive training method of the acoustic model to build a Burmese emotional speech synthesis system based on the DNN acoustic model to further improve The quality of speech synthesis.(4)On the basis of obtaining the baseline system of emotional speech synthesis,the transformation matrix that affects the probability distribution of acoustic feature parameters is adjusted by means of linear interpolation,and the emotional state of the synthesized speech is adjusted,so that the synthesized speech is between calm emotion and target adaptive emotion.transition between states.Based on the performance rules of acoustic feature parameters obtained through emotional speech analysis,the fundamental frequency,duration and other acoustic feature parameters that are strongly related to the emotional state of the synthesized speech are adjusted to further adjust the intensity of emotional speech expression.Finally,by introducing prosodic information,the synthesized emotional speech is made more natural,thereby improving the Burmese emotional speech synthesis system.The experimental results show that the method proposed in this paper can realize Burmese emotional speech synthesis under low resource conditions,the synthesized speech has emotional discrimination,and the fine-grained adjustment and control method of emotional speech adopted in this paper can effectively change and enhance the emotion of synthesized speech.expression,which proves the effectiveness of the method proposed in this paper.
Keywords/Search Tags:Burmese emotional speech synthesis, deep neural network, adaptive training, Fine-grained adjustment of emotional speech features
PDF Full Text Request
Related items