Research On End-to-End Mongolian Speech Synthesis Method

Posted on:2020-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z N Liu

Full Text:PDF

GTID:2428330596492640

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the in-depth study of Mongolian intelligent information processing technology by experts and scholars,Mongolian speech synthesis technology as an important part of it has been greatly developed.However,compared with other popular languages such as Chinese and English,Mongolian speech synthesis technology is not mature enough,and further research is needed to meet the practical needs of synthesizing speech quality.Firstly,this thesis studies the Mongolian Grapheme to Phoneme(G2P)module in the front-end processing part of the Mongolian speech synthesis method.For the first time,the Encoder-Decoder+Attention deep neural network model structure with attention mechanism is used to deal the problem of Mongolian G2 P,considering that the Mongolian G2 P method based on statistical methods alone cannot completely and correctly convert all words in the vocabulary and the variability of Mongolian word formation and its pronunciation.This thesis adds rule processing based on the EncoderDecoder+Attention model,and proposes a hybrid method to deal with the Mongolian G2 P.By comparing the experimental results,the Mongolian G2 P method based on the hybrid method has a 12.1% reduction in the word error rate(WER)compared with the traditional Mongolian G2 P method based on joint-sequence model,and the phoneme error rate(PER)is reduced by 2.8%.Secondly,there are some problems such as misreading,missing reading and large difference between the sound quality and the original audio for the speech synthesized by the existing end-to-end Mongolian speech synthesis method.In this thesis,the front-end processing part and the Mel-spectrum converted speech waveform part are improved respectively,and an improved end-to-end Mongolian speech synthesis method is proposed.In the front-end processing part,the Mongolian grapheme to phoneme module based on the hybrid method proposed above is added,and the original characters sequence is converted into the corresponding phonemes sequence as the input of the predicting Mel-spectrum model.In the part of the Mel-spectrum converted speech waveform,the original Mel-spectrum predicting spectral amplitude model and the Griffin-Lim algorithm were replaced with a WaveNet vocoder.According to the experimental results,the improved end-to-end Mongolian speech synthesis method has a Mean Opinion Score(MOS)of 4.26,and meets the practical requirements.

Keywords/Search Tags:

Speech synthesis, Mongolian, Grapheme to Phoneme, End-to-End, WaveNet vocoder

PDF Full Text Request

Related items

1	Research On Speech Synthesis Technology In Human-Computer Interacting System
2	Research And Implementation Of Mongolian Emotional Speech Synthesis System Based On Deep Learning
3	Research On Statistical Parametric Speech Synthesis Of Tibetan Lhasa Dialect
4	Research On Speech Synthesis Technology Of Amdo Tibetan Based On Seq2Seq?WaveNet
5	Independent Component Of The Chinese-based Phoneme Spectrum Analysis And Comparison Of Speech Synthesis Research
6	Research On Speech Keyword Spotting Technology For Mongolian
7	Rendering Speech Across Speaker And Language Difference
8	Research On HMM-RBM Based Mongolian Speech Synthesis
9	Research On Critical Algorithms Of Multiband Excitation Vocoder
10	Research On HMM-Based Mongolian Speech Synthesis