Design And Implementation Of End-to-End Myanmar Speech Synthesis System

Posted on:2022-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:Q L Qin

Full Text:PDF

GTID:2505306335457734

Subject:Telecom Technology

Abstract/Summary:

PDF Full Text Request

In recent years,the continuous development of deep learning has brought new research ideas and methods to the research of speech synthesis.The end-to-end speech synthesis technology based on deep learning can directly predict the speech spectrum from the text,and has shown a synthesis effect close to that of a human pronunciation in the synthesis of English,Chinese and other common languages.Myanmar is an important language in the Tibetan-Burman language family of the Sino-Tibetan language family.It has a long history and a large population.However,the research on Myanmar-related speech synthesis is relatively lagging due to the lack of electronic resources.This article takes advantage of the technical advantages of the end-to-end model,analyzes the language characteristics of Myanmar,applies the end-to-end deep learning model to the study of Myanmar speech synthesis,and designs and implements an end-to-end Myanmar speech synthesis system.The specific main work is as follows:（1）According to the character structure and phoneme composition of Myanmar,the preprocessing method and embedding method of Myanmar text are designed.The encoder in the end-to-end speech synthesis model extracts text features from the text and summarizes them into fixed-length feature vectors.The richness of text features is particularly important for the quality of synthetic speech.The Myanmar character structure is complex,the phoneme composition is special,the character set is large and unevenly distributed,and the encoder learning efficiency is reduced due to the sparsity of data in encoder feature extraction,which affects the synthesis effect.Therefore,this thesis designs a set of Myanmar text preprocessing scheme suitable for end-to-end encoder,which uses Initial Consonant-Vowel unit embedding instead of the original text as the text embedding of the encoder to explicitly represent the syllable information of Myanmar sentences.It improves the feature extraction efficiency of the encoder and makes the encoder more suitable for Myanmar text feature extraction under the condition of low resources.（2）Combined with the characteristics of the end-to-end speech synthesis system,the BERT pre-training language model is integrated into the end-to-end speech synthesis encoder to improve the synthesis effect and stability of the system.The network structure of the end-to-end speech synthesis model encoder is relatively simple,and the feature extraction of the encoder is affected by the exposure bias.Myanmar is a non-lingua franca,and the electronic resources corresponding to the available text pronunciation are relatively scarce.The BERT pre-training language model contains rich semantic information,and its word fragments are embedded in a wider dimension of contextual information,even with the whole sentence semantic information.The rich semantic features of the pre-trained model are added to the model system,and improvements are made according to the characteristics of Myanmar.The Initial consonant-Vowel unit embedding is coordinated with the wordpiece embedding of the BERT pre-training language model,The feature extraction capability of the encoder is improved,the influence of the exposure bias of the decoder is reduced,and the robustness of the system is improved.（3）The end-to-end model is used to realize the design of Myanmar Grapheme-tophoneme system.The phoneme embedding of the encoder is inseparable from the realization of Grapheme-to-Phoneme（G2P）.The traditional rule-based phoneme-tophoneme method has achieved good results,but with the increasing expansion of electronic resources and the gradual improvement of computer ability,the accuracy of data-driven Grapheme-to-phoneme conversion is also improving.Therefore,this study collects and collates the Graphemes-Phoneme data of Myanmar,combined with the model framework of Seq2 Seq,completes the attempt of end-to-end conversion of Myanmar Grapheme to Phoneme,and verifies the feasibility of the system through experiments.Through the above work,this thesis applies the end-to-end model to the research of Burmese speech synthesis and implements an end-to-end Myanmar speech synthesis system.The experimental results show that the system is effective and can synthesize high quality Myanmar speech under the condition of low resources.

Keywords/Search Tags:

Myanmar, Speech Synthesis, End-To-End Model, Deep Learning

PDF Full Text Request

Related items

1	End-To-End Tibetan Speech Synthesis Technology Based On Deep Learning
2	Cross-language Speech Synthesis Based On Deep Learning
3	Research On Prosodic Structure Prediction Method In Myanmar Speech Synthesis
4	Research On Speech Synthesis Technology For Tibetan Lhasa Based On Fully End-to-End Method
5	Myanmar Prosodic Features Analysis And Prediction For Speech Synthesis
6	Research On Burmese Emotional Speech Synthesis
7	Research On Dunhuang Mural Inpainting Based On Texture Synthesis And Deep Learning
8	The Design And Realization Of Mogolian Speech Synthesis System
9	Research On The Speech Synthesis Technology Of Tibetan Dialect
10	Research On Mandarin-Xingtai Dialect Cross-lingual Speech Synthesis