Font Size: a A A

Speech Enhancement Algorithm Based On Generative Adversarial Network

Posted on:2021-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:N Y TanFull Text:PDF
GTID:2518306122966959Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Speech enhancement technology can be used to enhance the accuracy of speech information in the field of traditional communications.In new intelligent electronic devices,it can be used as a front-end processing technology for speech recognition to improve performance in noisy environments.There are two existing speech enhancement algorithms,traditional speech enhancement algorithms and neural network-based speech enhancement algorithms.The former can only improve speech quality under a few types of noise,and often produces excess music noise.The latter,as a new method,can achieve higher speech enhancement performance,this thesis takes it as the research object.With the development of neural networks,generative adversarial networks have achieved excellent results in the field of pictures.Although the speech enhancement algorithm based on GAN can improve the generalization ability of traditional algorithms under various types of noise,the performance is still poor under low signal-to-noise ratio.Based on this,in order to improve the performance of the speech enhancement algorithm at low signal-to-noise ratio,this thesis innovatively adopts a framework combining the Wasserstein generative adversarial network with gradient penalty terms and the conditional generative adversarial network in the algorithm design,and proposes a new speech enhancement algorithm,namely SEWGAN algorithm.The main research work of this thesis is divided into the following two parts.The first part is the design of SEWGAN algorithm.The conditional generative adversarial network is used in the overall framework of the algorithm.The purpose is to use noisy speech samples as additional information to instruct the generator to generate corresponding speech samples,thus solving the problem that the speech generated by the original GAN is clear enough,but the content has nothing to do with the input noisy speech,which improves the practicality of the algorithm.At the same time,by using the Wasserstein generative adversarial network with gradient penalty terms in the loss function of the algorithm,the generator can fit the distribution of pure speech better,thus not only getting better enhanced performance,but also enhancing the algorithm’s ability to adapt to unseen noise environments.The second part is the implement of the SEWGAN algorithm.And compare the performance of the SEWGAN algorithm with the traditional speech enhancement algorithms and the algorithm based on GAN.The implement of the algorithm isconducted on a virtual operating system built on Linux using nvidia-docker,using Tensorflow,an open source library developed by Google,to build a network framework model.We train the SEWGAN algorithm under the same training set.During the training process,layer normalization is used to normalize the algorithm,and the Adam algorithm is used to accelerate the algorithm’s convergence speed.Afterwards,the samples on the same test set are enhanced with SEWGAN algorithm,multi-band spectral subtraction,Wiener filtering method,logarithmic MMSE estimator and the algorithm based on GAN,respectively.Then the enhanced speech and the corresponding pure speech are evaluated for objective speech quality by using Matlab.Experiment results show that,compared with the best logarithmic MMSE estimator among the three traditional algorithms,the SEWGAN algorithm improves the segmented SNR under the conditions of 17.5d B,12.5d B,7.5d B and 2.5d B,the improvement in segmented SNR is 1.54%,17.07%,47.98%,148.72%,respectively;In the suppression of the five types of noise of bus,cafe,living,office,and psquare,the improvement in segmented SNR is 25.43%,54.98%,39.56%,16.80%,and 32.84%,respectively.In the entire test set,compared with the best algorithm based on GAN,the SEWGAN algorithm improved PESQ by 9.26%,the speech signal distortion by5.46%,the intrusiveness of background noise by 6.80%,the overall quality by 7.14%,and the segmented SNR measurement by 19.15%.To conclude,from the perspective of objective voice quality improvement,the enhancement performance of the SEWGAN algorithm has been significantly improved,especially under the low SNR of 2.5d B,which is 17.6% higher than the algorithm based on GAN in the segmented SNR measurement.In summary,the SEWGAN algorithm is a successful practice of applying the Wasserstein generative adversarial network with gradient penalty terms and the conditional generative adversarial network to speech enhancement,and achieves excellent speech enhancement performance.
Keywords/Search Tags:speech enhancement, generative adversarial networks, Wasserstein distance, gradient penalty
PDF Full Text Request
Related items