Font Size: a A A

Mongolian Voice Conversion System Based On Deep Learning

Posted on:2020-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y L Q Q G SuFull Text:PDF
GTID:2428330596992279Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the maturity of personalized speech synthesis technology and the diversification of human-computer interaction,voice conversion technology has been concerned by researchers at home and abroad.Voice conversion technology refers to a special voice synthesis technology that converts the source speaker's voice into the target speaker's voice on the premise of the same voice content.It can be used in the back end of voice synthesis system to generate a variety of personalized voice synthesis effects.In recent years,great progress has been made in voice conversion technology for mainstream languages such as Chinese and English,but there is no relevant research results on Mongolian-oriented voice conversion technology.In this thesis,deep learning technology is used to study Mongolian voice conversion technology.Firstly,based on the basic principles and model architecture of voice conversion technology in mainstream languages such as Chinese and English,this paper studies the Mongolian voice conversion model based on the attention mechanism Encoder-Decoder structure,and realizes the end-to-end Mongolian voice conversion technology,which maps the acoustic parameters of the source voice directly to the acoustic parameters of the target voice.In order to verify the validity of the model,a Mongolian voice conversion model based on deep bidirectional long-term and short-term memory network(DBLSTM)is constructed and evaluated objectively and subjectively.The objective evaluation shows that Mongolian speech conversion model based on Encoder-Decoder structure can better fit the acoustic parameters of real target voice.Subjective evaluation shows that Mongolian speech conversion model based on Encoder-Decoder structure has higher subjective average opinion score(MOS).It has better naturalness and continuity.In addition,this paper builds a Mongolian voice conversion system based on the optimal Mongolian speech conversion model obtained in the experiment.The main function of the system is to convert the source speaker's voice of adult women into the target voice of girls,and realize the auxiliary functions of reading the source voice,downloading the converted voice,adjusting the voice speed,adjusting the volume and suspending the playback.Finally,the function test and stress test of Mongolian voice conversion system are carried out,and the test results are in line with the expected results.
Keywords/Search Tags:Voice Conversion, Mongolian, Neural Network, Encoder-Decoder, Attention Mechanism
PDF Full Text Request
Related items