The Research Of Personalized Voiceprint Feature Speech Synthesis Technology Based On GAN

Posted on:2023-06-26

Degree:Master

Type:Thesis

Country:China

Candidate:D G Chen

Full Text:PDF

GTID:2558306620454774

Subject:Software engineering theory and methods

Abstract/Summary:

PDF Full Text Request

The main task of speech synthesis technology is to convert text information into speech information.In recent years,with the rapid development of deep learning and neural networks,speech synthesis technology has gradually matured and is widely used in intelligent audio,voice navigation,information broadcasting,video dubbing,music generation,and other fields.Today,in many practical application scenarios,the synthesized speech not only needs to express the correct text information but also needs to meet the user’s needs for personalized voiceprint features.However,the existing speech synthesis models mainly focus on synthesis speed and audio quality but cannot fit personalized voiceprints.To solve the problem that the existing models have a poor ability to generate personalized speech,this thesis studies the speech synthesis technology of personalized voiceprint features based on deep learning and proposes a speech synthesis model based on generative adversarial network.The proposed model is evaluated in terms of voiceprint similarity score,speech quality score,and speech synthesis speed.The main research contents of this thesis are as follows:(1)Aiming at the problem that the existing speech synthesis models have insufficient ability to fit personalized voiceprints,based on generative adversarial networks and voiceprint feature extraction technology,a speech synthesis model with personalized voiceprint features is proposed.By comparing and evaluating with other models,the final experimental results show that the model can effectively improve the fitting ability of personalized voiceprints.(2)Aiming at the slow speech synthesis speed of the designed model and the slightly poor audio quality of the generated speech,a scheme to optimize the performance of the model is designed.The scheme firstly improves the perception ability of the generative network and optimizes the weight distribution of the loss item in the loss function to solve the problem of slow synthesis speed;secondly,the multi-domain signal processing method is used to redesign the discriminant network to solve the problem of slightly poor speech quality.(3)The optimized model is tested based on multiple real datasets.Through the evaluation and comparison with other methods in the three indicators of voiceprint similarity score,speech quality score,and speech synthesis speed,it is verified that the model can effectively solve the problem of insufficient ability of existing models to fit personalized voiceprints.The model can quickly complete the synthesis of personalized voiceprint feature speech and at the same time ensure that the synthesized speech has a high speech quality score.

Keywords/Search Tags:

Speech synthesis, Generative adversarial network, Personalized voiceprint feature

PDF Full Text Request

Related items

1	The Research Of Personalized Speech Synthesis Based On Generative Adversarial Network
2	Research On Emotional Speech Synthesis Based On Generative Adversarial Networks
3	Research On Personalized Speech Synthesis Based On Deep Speech Representations
4	Research On Facial Expression Synthesis Based On Generative Adversarial Networks
5	Research On Methods Of Improving Speech Communication Quality Based On Generative Adversarial Network
6	Image Synthesis Based On Generative Adversarial Networks
7	Research On Face Sketch Synthesis Algorithm Based On Generative Adversarial Networks
8	Research On Neural Network Based Statistical Parametric Speech Synthesis
9	Generative Adversarial Network For Text-to-Image Synthesis
10	Research On Face Image Synthesis Based On Generative Adversarial Network