Font Size: a A A

Cross-Lingual Voice Conversion Based On Activation Guide And Involution

Posted on:2023-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:S L DaiFull Text:PDF
GTID:2568306836472424Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As one of the most convenient communication methods,speech contains rich semantic information,speaker personality information and emotion information etc.The goal of voice conversion(VC)is to convert the personality information of the source speaker into the target speaker ones,while keeping the semantic information in the source speaker’s speech unchanged.As an important branch in the field of VC,cross-lingual VC has very important application value in voice interaction system,international cultural communication and so on.In recent years,thanks to the modeling capabilities of deep neural networks(DNNs),the field of cross-lingual VC has been developed rapidly,various cross-lingual VC models based on DNNs have achieved good conversion performance in the closed-set case.In practical applications,the cross-lingual VC model should be suitable for any speaker,so it is necessary to realize conversion task in the open-set case.In addition,the operation efficiency of algorithm will affect the requirements for computing resources and storage devices under the same performance.Therefore,this thesis discusses the above two aspects and proposes a series of innovation work.First,in order to realize cross-lingual VC in the open-set case,this thesis proposes a cross-lingual VC model based on activation guide(AG),which adopts a U-connected encoder-decoder structure.In the encoder,instance normalization(IN)and AG are two key steps,enabling the encoder to extract semantic information and speaker personality information,which is not restricted by the type of language and the number of speakers.Specifically,IN operation can dynamically extract speaker personality information and obtain preliminary representation of semantic information,AG can be used as a soft bottleneck to extract semantic information of different languages.In the decoder,adaptive instance normalization(Ada IN)is used to fuse the semantic information and the speaker personality information,so as to obtain the converted speech and realize cross-lingual VC.The experimental results show that,the average value of MOS is 3.44,and the average value of ABX is83.86%,indicating that the proposed model can realize cross-lingual VC in the open-set case,and achieve good performance.Furthermore,in order to improve the operation efficiency of algorithm,this thesis proposes a cross-lingual VC model based on AG and involution,it uses involution to replace some convolutions in the model.Involution has the characteristics of light weight and high efficiency,which can reduce the requirement of parameter and computation for the model and improve the operation efficiency of algorithm.The experimental results show that,the parameter and computation of this model are reduced by 37.66% and 38.68% respectively,the training time of this model is accelerated by 24.63%,the average value of MOS is 3.38 and the average value of ABX is 83.60%.It is verified that the optimization scheme can greatly reduce the parameter and computation of the model on the premise of ensuring the conversion effect of the model,and achieve the purpose of improving the operation efficiency of algorithm.In Summary,the proposed cross-lingual VC model based on AG and involution in this thesis can realize cross-lingual VC in the open-set case,and achieve good performance.In addition,the model also has high operation efficiency of algorithm,which is an important theoretical discussion for the practical application of cross-lingual VC technology.
Keywords/Search Tags:voice conversion, cross-lingual voice conversion, activation guide, involution, instance normalization, adaptive instance normalization
PDF Full Text Request
Related items