Font Size: a A A

Single Channel Speech Enhancement Based On Generative Adversarial Networks

Posted on:2024-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2568307100480594Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Speech enhancement is a technology that can improve speech quality and intelligibility,which is of great significance for promoting the development of communication technology and artificial intelligence industries.In recent years,Generative Adversarial Networks(GANs)have been increasingly used in speech enhancement tasks.This paper mainly studies the speech enhancement algorithm based on GAN.The main work of the paper is as follows:1.Aiming at the problem of insufficient utilization of speech information by the standard GAN network speech enhancement algorithm,by studying the timing modeling ability of Deep Feed-forward Sequential Memory Networks(DFSMN)and the adversarial training mechanism of GAN,a single channel speech enhancement algorithm based on residual gated DFSMN-based generative adversarial network(RGDFSMN-GAN)is proposed.The network structure of the algorithm includes a generator and a discriminator.The generator uses the feature extraction ability of the convolutional neural network and the timing modeling ability of the residual gated DFSMN to model the long-term dependence in the time series,and selectively highlight salient features of different speech contexts.During the training stage,by adversarial training of the generator and the discriminator,the algorithm learns the mapping from the magnitude spectral features of noisy speech to the magnitude spectral features of clean speech.During the enhancement stage,the generator generates the enhanced speech amplitude spectrum,and the enhanced speech can be obtained after combining the phase information of the noisy speech,thereby realizing speech enhancement.Experiments show that the algorithm can effectively suppress background noise and improve the overall quality of speech.2.Aiming at the lack of a large number of parallel clean-noisy speech pairs in realworld scenarios,by studying cycle-consistent generative adversarial network(CycleGAN),a non-parallel speech enhancement algorithm based on residual gated DFSMN cycle-consistent GAN(RG-DFSMN-CycleGAN)is proposed.The network structure of the algorithm includes two generators and two discriminators,and both generators use a convolutional encoder-decoder structure with residual gated DFSMN.Through the adversarial training mechanism,combined with adversarial loss,cycle consistency loss and identity mapping loss to jointly train the network,the algorithm can learn the bi-directional mapping between the magnitude spectrum features of noisy speech and the magnitude spectrum features of clean speech under the training of nonparallel speech data sets(noisy speech and clean speech do not match),and suppress background noise while preserving speech components,so as to achieve speech enhancement.Experiments show that the algorithm improves the performance of speech enhancement trained on non-parallel speech data.
Keywords/Search Tags:Speech enhancement, Generative adversarial network, Deep feed-forward sequence memory network
PDF Full Text Request
Related items