Research On Many To Many Voice Conversion Based On I-vector And Improved Variational Autoencoder For Non-parallel Corpora

Posted on:2020-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:J L Xu

Full Text:PDF

GTID:2428330590495965

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

The voice conversion is a technology that converts the speaker personality characteristics of the source speaker into the target speaker while keeping the speech content unchanged.According to the corpora obtained for the voice conversion,the voice conversion can be divided into voice conversion under parallel corpora and voice conversion under non-parallel corpora.There are two problems in the existing voice conversion model under the non-parallel corpora.On the one hand,the quality of converted voice is not satisfied.On the other hand,the personality similarity of voice is not very accurate.The paper focuses on improving the performance of the model by introducing speaker identity vector and improving variational auto-encoder.Firstly,it is insufficient for speaker one-hot vector to indicate speaker identity information.In order to improving the personality similarity of the converted speech,the speaker identity vector is introduced into the model to enrich the speaker identity information.Analysis of the results shows that the average value of MCD is decreased by 3.34%,the average value of MOS is increased by 1.6%,the average value of ABX is increased by 3.75% in the case of same gender and the average value is increased by 4.37% in the case of cross gender compared with the voice conversion model based on VAE+one-hot.The results indicates that the proposed method improves the speaker personality similarity and the speech quality for the converted speech.Secondly,it is insufficient for original VAE model to learn the information from the latent bottleneck.In order to facilitate the learning of disentangled representations and increase the information capacity of the latent code during training,this method introduces parameters ? and C into VAE to get the BETA-VAE model.Analysis of the results shows that the average value of MCD is reduced by 4.10%,the average value of MOS is increased by 5.33%,the average value of ABX is increased by 5.62% in the case of same gender and the average value is increased 4.37% in the case of cross gender compared with voice conversion model based on VAE.The results indicates that the proposed method improves the speaker similarity and the speech quality effectively.In addition,i-vector is added to BETA-VAE to get BETA-VAE+i-vector model in this paper.The evaluations show that the average value of MCD of the converted speech is decreased by 5.50%,the average value of MOS is increased by 6.23% and the average value of ABX is increased by 6.87% in the case of same gender and 5.62% in the case of cross gender compared with the model based on VAE and BETA-VAE.The result indicate that this method has a great improvement in speech quality and speaker similarity.

Keywords/Search Tags:

voice conversion, variational auto-encoder, BETA variational auto-encoder, i-vector, non-parallel corpora, many to many voice conversion

PDF Full Text Request

Related items

1	Research On Many-to-Many Voice Conversion Based On I-vector,Variational Auto-encoder And Generative Adversarial Networks For Non-parallel Corpora
2	Non-parallel Voice Conversion Using ACGAN And Variational Autoencoders Conditioned By Sentence Embedding
3	High-quality Voice Conversion From Non-parallel Corpora Based On Variational Auto-encoder And Bottleneck Feature
4	Research On Any-to-any Emotional Voice Conversion Based On Variational Auto-encoder
5	Research On Speech Conversion Algorithms Based On Deep Convolutional Auto Encoder
6	Deep Auto-encoder Framework For SAR Images Change Detection
7	Research On Collaborative Filtering Recommendation Algorithm Based On Improved Variational Auto-encoder
8	Research On Voice Conversion System Based On Vector Quantized Variational Autoencoder
9	Research And Application Of Representation Learning Based On Variational Auto-encoder
10	Research On Neural Topic Modeling Method Based On Variational Auto-Encoder