| The main task of speech synthesis technology is to convert text information into speech information.In recent years,with the rapid development of deep learning and neural networks,speech synthesis technology has gradually matured and is widely used in intelligent audio,voice navigation,information broadcasting,video dubbing,music generation,and other fields.Today,in many practical application scenarios,the synthesized speech not only needs to express the correct text information but also needs to meet the user’s needs for personalized voiceprint features.However,the existing speech synthesis models mainly focus on synthesis speed and audio quality but cannot fit personalized voiceprints.To solve the problem that the existing models have a poor ability to generate personalized speech,this thesis studies the speech synthesis technology of personalized voiceprint features based on deep learning and proposes a speech synthesis model based on generative adversarial network.The proposed model is evaluated in terms of voiceprint similarity score,speech quality score,and speech synthesis speed.The main research contents of this thesis are as follows:(1)Aiming at the problem that the existing speech synthesis models have insufficient ability to fit personalized voiceprints,based on generative adversarial networks and voiceprint feature extraction technology,a speech synthesis model with personalized voiceprint features is proposed.By comparing and evaluating with other models,the final experimental results show that the model can effectively improve the fitting ability of personalized voiceprints.(2)Aiming at the slow speech synthesis speed of the designed model and the slightly poor audio quality of the generated speech,a scheme to optimize the performance of the model is designed.The scheme firstly improves the perception ability of the generative network and optimizes the weight distribution of the loss item in the loss function to solve the problem of slow synthesis speed;secondly,the multi-domain signal processing method is used to redesign the discriminant network to solve the problem of slightly poor speech quality.(3)The optimized model is tested based on multiple real datasets.Through the evaluation and comparison with other methods in the three indicators of voiceprint similarity score,speech quality score,and speech synthesis speed,it is verified that the model can effectively solve the problem of insufficient ability of existing models to fit personalized voiceprints.The model can quickly complete the synthesis of personalized voiceprint feature speech and at the same time ensure that the synthesized speech has a high speech quality score. |