CVAE Based Tone Speech Synthesis And It’s Application In Portable Translator

Posted on:2022-06-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2518306494977409

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the continuous development of computer science and artificial intelligence technology,speech synthesis technology has been widely used in the fields of text reading and intelligent navigation,and the machine output speech has slowly met intelligibility,but the current speech synthesis still has problems such as low naturalness and lack of tone emotion.To address the lack of good support for tone in current speech synthesis,this paper proposes a tone speech synthesis model based on Conditional Variational auto-encoder(CVAE)to enhance the expressiveness of synthesized speech,and applies it to the development practice of portable simultaneous translator.As an important emotional expression in the discourse message,tone is an expression of the speaker’s attitude and opinion towards the stated content,which can provide important emotional color to the discourse message.Tone of voice can usually be divided into two types: emotional tone and functional tone of voice categories such as statements,questions,and exclamations.Emotional tone is highly subjective,difficult to annotate in the corpus,and lacks the support of a large corpus of emotional tone.While Variational Auto-coding(VAE)is a class of unsupervised generative models that can perform effective hidden variable characterization of high-dimensional information while accomplishing the task of data generation.Therefore,this paper introduces the VAE model to learn the representation of the implicit emotional tone information,and inputs the functional tone as a condition to the self-encoder network to build a conditional variational self-coding based tone model to realize the generation of class-specific tone and changing emotional tone for synthetic speech.Based on the tone model,a tone speech synthesis model,including acoustic model,vocoder and other modules,is proposed in combination with Statistical Parameter Speech Synthesis(SSPS)method to generate speech with tone.Using Blizzard Challenge 2018 as a corpus for model training,the synthesis of speech was performed using the World vocoder,and it was shown that the proposed model can generate specific tone categories while having tone diversity in terms of emotional tone generation from both subjective and objective aspects.In this paper,we propose a general design of simultaneous translation based on tone speech synthesis,including speech recognition,language translation,tone recognition and tone conversion,as well as tone speech synthesis and other modules to achieve mutual conversion between different languages,enhance the output of tone information without changing the expression content,and improve the The effect of conversational communication is improved.Further,to address the problems of lack of portability and inflexible use of current speech translators.This paper carries out the design and implementation of personal portable simultaneous translator,including the hardware design of portable pickup and playback equipment,simultaneous translator APP development and translator cloud platform in three parts.The design and development of portable Bluetooth simultaneous voice translator is completed based on Android platform,which realizes the local transmission of voice and text based on Bluetooth technology,calls the third-party cloud server for voice recognition and language text translation as a client,and remotely calls the tone speech synthesis server to generate voice with tone in the target language to realize the conversation between different language speakers and eliminate the language communication It has important application value.

Keywords/Search Tags:

speech synthesis, tone, conditional variational auto-encoder, bluetooth wireless transmission, portable translator

PDF Full Text Request

Related items

1	The Research On Conditional Variational Auto-encoder And Its Optimization Method For Network Performance Prediction
2	Technology Study Of Terrain Synthesis Based On Learning Strategy
3	Research On Any-to-any Emotional Voice Conversion Based On Variational Auto-encoder
4	Deep Auto-encoder Framework For SAR Images Change Detection
5	Research And Application Of Answer Generation Model Based On Conditional Variational Autoencoder
6	Research On Speech Synthesis Method With Emotion Embedding
7	Research And Application Of Representation Learning Based On Variational Auto-encoder
8	Research On Collaborative Filtering Recommendation Algorithm Based On Improved Variational Auto-encoder
9	Research On Neural Topic Modeling Method Based On Variational Auto-Encoder
10	Speech Feature Encoding And Emotion Recognition Based On Auto Encoder