| The voice conversion is a technology that converts the speaker personality characteristics of the source speaker into the target speaker while keeping the speech content unchanged.According to the corpora obtained for the voice conversion,the voice conversion can be divided into voice conversion under parallel corpora and voice conversion under non-parallel corpora.There are two problems in the existing voice conversion model under the non-parallel corpora.On the one hand,the quality of converted voice is not satisfied.On the other hand,the personality similarity of voice is not very accurate.The paper focuses on improving the performance of the model by introducing speaker identity vector and improving variational auto-encoder.Firstly,it is insufficient for speaker one-hot vector to indicate speaker identity information.In order to improving the personality similarity of the converted speech,the speaker identity vector is introduced into the model to enrich the speaker identity information.Analysis of the results shows that the average value of MCD is decreased by 3.34%,the average value of MOS is increased by 1.6%,the average value of ABX is increased by 3.75% in the case of same gender and the average value is increased by 4.37% in the case of cross gender compared with the voice conversion model based on VAE+one-hot.The results indicates that the proposed method improves the speaker personality similarity and the speech quality for the converted speech.Secondly,it is insufficient for original VAE model to learn the information from the latent bottleneck.In order to facilitate the learning of disentangled representations and increase the information capacity of the latent code during training,this method introduces parameters ? and C into VAE to get the BETA-VAE model.Analysis of the results shows that the average value of MCD is reduced by 4.10%,the average value of MOS is increased by 5.33%,the average value of ABX is increased by 5.62% in the case of same gender and the average value is increased 4.37% in the case of cross gender compared with voice conversion model based on VAE.The results indicates that the proposed method improves the speaker similarity and the speech quality effectively.In addition,i-vector is added to BETA-VAE to get BETA-VAE+i-vector model in this paper.The evaluations show that the average value of MCD of the converted speech is decreased by 5.50%,the average value of MOS is increased by 6.23% and the average value of ABX is increased by 6.87% in the case of same gender and 5.62% in the case of cross gender compared with the model based on VAE and BETA-VAE.The result indicate that this method has a great improvement in speech quality and speaker similarity. |