Font Size: a A A

Research On Neural Network Based Tibetan Speech Synthesis Technique

Posted on:2020-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:G C DuFull Text:PDF
GTID:2415330578964433Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speech synthesis is one of the core technologies in human-computer interaction research,and which is also a cutting-edge technology in the field of information processing.Speech synthesis is aiming to transform text sequence into clear,natural and fluent vocal information in real time.It's research has very important theoretical significance and practical value for the development of human-machine voice communication,intelligent robots and automatic voice broadcasting.With the rapid development of computer and multimedia technology,speech synthesis technology has drastically attracted attentions in the fields.Especially in recent years,the successful application of neural network in machine translation,text categorization,question answering system,information extraction and speech recognition makes neural netwok-based speech synthesis technology gradually become a research hotspot worldwide.Tibetan speech synthesis is one of the important research tasks of Tibetan Information Processing.However,compared with Chinese and English,the research of Tibetan speech synthesis technology is still in the developing stage.At present,the implementation of Tibetan speech synthesis system mainly uses waveform splicing technology and statistical parameter speech synthesis technology based on HMM model.Considering that waveform splicing technology requires high storage capacity and long system construction period,and the prosodic performance of synthetic speech based on statistical parameter speech synthesis technology is not satisfactory,this paper presented a Tibetan speech synthesis technology based on neural network by analyzing the structural characteristics and spelling rules of Tibetan,using Seq2 Seq model and attention mechanism.This paper mainly studied Tibetan speech synthesis technology from the following three aspects:(1)Starting from the front end of the speech synthesis system,the structure and spelling rules of Tibetan characters were analyzed based on the traditional Tibetan language method,and the Tibetan component decomposition algorithm was presented.At the same time,the Seq2 Seq model based on attention mechanism was used to predict the prosody of Tibetan texts.(2)Starting from the back end of the speech synthesis system,an acoustic model of Tibetan speech synthesis was designed based on Seq2 Seq model,with emphasis on the research of encoders and decoders for Tibetan speech synthesis.Finally,the Tibetan speech waveform was generated by Griffin-Lim algorithm.(3)By comparing the performance of the corpus-based Tibetan speech synthesis system with that of the neural network-based Tibetan speech synthesis system,the effectiveness of the proposed method is verified.The experiments indicated that the Tibetan speech synthesis system based on neural network can achieve better performance under the condition of large-scale corpus.
Keywords/Search Tags:Tibetan Speech Synthesis, Word Embedding, Prosody Prediction, Neural Networks, Attention Mechanism
PDF Full Text Request
Related items