Mongolian Voice Conversion System Based On Deep Learning

Posted on:2020-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Q Q G Su

Full Text:PDF

GTID:2428330596992279

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the maturity of personalized speech synthesis technology and the diversification of human-computer interaction,voice conversion technology has been concerned by researchers at home and abroad.Voice conversion technology refers to a special voice synthesis technology that converts the source speaker's voice into the target speaker's voice on the premise of the same voice content.It can be used in the back end of voice synthesis system to generate a variety of personalized voice synthesis effects.In recent years,great progress has been made in voice conversion technology for mainstream languages such as Chinese and English,but there is no relevant research results on Mongolian-oriented voice conversion technology.In this thesis,deep learning technology is used to study Mongolian voice conversion technology.Firstly,based on the basic principles and model architecture of voice conversion technology in mainstream languages such as Chinese and English,this paper studies the Mongolian voice conversion model based on the attention mechanism Encoder-Decoder structure,and realizes the end-to-end Mongolian voice conversion technology,which maps the acoustic parameters of the source voice directly to the acoustic parameters of the target voice.In order to verify the validity of the model,a Mongolian voice conversion model based on deep bidirectional long-term and short-term memory network(DBLSTM)is constructed and evaluated objectively and subjectively.The objective evaluation shows that Mongolian speech conversion model based on Encoder-Decoder structure can better fit the acoustic parameters of real target voice.Subjective evaluation shows that Mongolian speech conversion model based on Encoder-Decoder structure has higher subjective average opinion score(MOS).It has better naturalness and continuity.In addition,this paper builds a Mongolian voice conversion system based on the optimal Mongolian speech conversion model obtained in the experiment.The main function of the system is to convert the source speaker's voice of adult women into the target voice of girls,and realize the auxiliary functions of reading the source voice,downloading the converted voice,adjusting the voice speed,adjusting the volume and suspending the playback.Finally,the function test and stress test of Mongolian voice conversion system are carried out,and the test results are in line with the expected results.

Keywords/Search Tags:

Voice Conversion, Mongolian, Neural Network, Encoder-Decoder, Attention Mechanism

PDF Full Text Request

Related items

1	A Study On Mongolian-Chinese Machine Translation Based On Neural Network
2	Mongolian Speech Conversion System Based On RBF-GMM
3	Research And Application Of Self-attention Mechanism In Semantic And Sentiment Analysis
4	Design Of Mathematical Formula Recognition System Based On Convolutional Neural Network And Attention Mechanism
5	Visual Data Understanding Based On Deep Encoder-Decoder Framework
6	Research On Scene-based Image Semantic Description Generation Technology
7	Research On Encoder-Decoder Model For Complex Structure Text Recognition
8	Research On Text Title Generation Method Based On Eye Movement Attention Mechanism
9	Research On Time Series Classification Based On Spectrum Attention Mechanism And Encoder-Decoder Model
10	Grammar Constrained Double-Layer Encoder Decoder For Neural Semantic Parsing