Font Size: a A A

Research And Implementation Of Speech Synthesis Based On Fastpeech

Posted on:2022-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z C WangFull Text:PDF
GTID:2568307037461014Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Speech synthesis is a key technology in the field of artificial intelligence and an important direction for the next generation of technological revolution.The current speech synthesis technology mainly has the following problems: low naturalness of synthesized audio and strong sense of mechanics;few types of dialects that can be synthesized,and weak regional adaptability;fewer speakers that can be synthesized,and monotonous speech style;Chinese and English mixed text Poor synthesis effect,etc.In order to solve the above problems,this thesis proposes three speech synthesis methods and implements them in software.This article mainly does the following four points of work:(1)First of all,in order to solve the problems of low naturalness and low intelligibility of traditional speech synthesis technology,and the instability of deep learning speech synthesis model,this thesis proposes a Chinese speech synthesis method based on Fastspeech.In this method,the input Chinese text is synthesized into the corresponding audio waveform through three parts: front-end processing,acoustic model and vocoder.The experimental results show that the naturalness subjective opinion score of the synthesized audio of the model in this thesis is relatively high and the overall performance is good.(2)Based on the Chinese speech synthesis model,in order to solve the problem that speech synthesis technology can synthesize fewer dialects and poor regional adaptability,this thesis proposes a Sichuan dialect speech synthesis method based on Fastspeech.Constructed Sichuan dialect database and data set,after using Wiener filter algorithm to denoise the background of the data set audio,the Sichuan dialect data set was fine-tuned and trained on the basis of the Chinese speech synthesis model in this article,and the training was completed by transfer learning.The Sichuan dialect speech synthesis model.The experimental results show that the subjective opinion score of the naturalness of the synthesized audio is relatively high.(3)Based on the Chinese speech synthesis model,in order to solve the problem of poor Chinese and English mixed speech synthesis and less speaker style,this thesis studies a Chinese and English multi-speaker speech synthesis method.This method uses two encoders of language and speaker to encode Chinese and English and the speaker respectively,and uses acoustic models and vocoders to synthesize speech.The experimental results show that the naturalness of the Chinese and English mixed audio synthesized by this method and the subjective opinion scores of the speaker style of the synthesized audio and the similarity of the original audio style are higher.(4)This thesis designs speech synthesis application software and integrates three speech synthesis methods.The user can input any text and choose any method to synthesize audio,and use the multi-speaker model to specify the style of a certain speaker for synthesis.In addition,a voice Turing test software is designed,which can test whether the naturalness of synthesized audio can reach the level of real human pronunciation.
Keywords/Search Tags:Chinese speech synthesis, Sichuan speech synthesis, Chinese-English multi-speaker, Attention mechanism, Deep learning
PDF Full Text Request
Related items