Font Size: a A A

Research On End-to-End Mongolian Speech Synthesis Method

Posted on:2020-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z N LiuFull Text:PDF
GTID:2428330596492640Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the in-depth study of Mongolian intelligent information processing technology by experts and scholars,Mongolian speech synthesis technology as an important part of it has been greatly developed.However,compared with other popular languages such as Chinese and English,Mongolian speech synthesis technology is not mature enough,and further research is needed to meet the practical needs of synthesizing speech quality.Firstly,this thesis studies the Mongolian Grapheme to Phoneme(G2P)module in the front-end processing part of the Mongolian speech synthesis method.For the first time,the Encoder-Decoder+Attention deep neural network model structure with attention mechanism is used to deal the problem of Mongolian G2 P,considering that the Mongolian G2 P method based on statistical methods alone cannot completely and correctly convert all words in the vocabulary and the variability of Mongolian word formation and its pronunciation.This thesis adds rule processing based on the EncoderDecoder+Attention model,and proposes a hybrid method to deal with the Mongolian G2 P.By comparing the experimental results,the Mongolian G2 P method based on the hybrid method has a 12.1% reduction in the word error rate(WER)compared with the traditional Mongolian G2 P method based on joint-sequence model,and the phoneme error rate(PER)is reduced by 2.8%.Secondly,there are some problems such as misreading,missing reading and large difference between the sound quality and the original audio for the speech synthesized by the existing end-to-end Mongolian speech synthesis method.In this thesis,the front-end processing part and the Mel-spectrum converted speech waveform part are improved respectively,and an improved end-to-end Mongolian speech synthesis method is proposed.In the front-end processing part,the Mongolian grapheme to phoneme module based on the hybrid method proposed above is added,and the original characters sequence is converted into the corresponding phonemes sequence as the input of the predicting Mel-spectrum model.In the part of the Mel-spectrum converted speech waveform,the original Mel-spectrum predicting spectral amplitude model and the Griffin-Lim algorithm were replaced with a WaveNet vocoder.According to the experimental results,the improved end-to-end Mongolian speech synthesis method has a Mean Opinion Score(MOS)of 4.26,and meets the practical requirements.
Keywords/Search Tags:Speech synthesis, Mongolian, Grapheme to Phoneme, End-to-End, WaveNet vocoder
PDF Full Text Request
Related items