Font Size: a A A

Speech Enhancement Method Based On Deep Learning

Posted on:2022-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:H JingFull Text:PDF
GTID:2568307034489864Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement is a key technology to improve the quality and intelligibility of target speech by suppressing the interference of background noise.It is mainly used in fields such as human-computer interaction,and mobile communication.Most of the existing algorithm based on the statistical properties of the voice,and assuming statistical independence between noise and speech,and meet the characteristics of Gaussian distribution,this assumption is not reasonable,however,from the point of enhancement effect,under the precondition of stationary noise and high signal-to-noise ratio,has obtained the remarkable enhancement effect,but in low SNR and non-stationary noise environment,the effects are not so good.For this reason,in the context of deep learning,this article focuses on the poor robustness and limited generalization ability of existing speech enhancement algorithms when processing unknown types of noisy speech,and conducts in-depth research on the network structure,and proposes two speech Enhancement methods.The specific research results are as follows:1.The traditional single-channel network model is unable to fully extract the deep features of speech due to its limited representation ability,resulting in insignificant enhancement effect.In view of this,a dual-channel convolutional attention network speech enhancement method is proposed.In this method,convolutional neural network feature extraction channel and long short-term memory network feature extraction channel are built to fully mine the deep features of speech,and attention module is added in each channel to weight the output features of the channel according to the degree of attention,so as to achieve the purpose of emphasizing key information.The experimental verification on public speech data sets shows that the enhancement effect of the network model including the dual-channel structure and attention module is significantly better than other comparison models,which further confirms the feasibility of the proposed model.2.The common non-end-to-end speech enhancement methods generally use time-frequency decomposition to transform speech to frequency domain processing.However,this transformation processing takes up part of the computing resources,resulting in the delay problem in the enhancement process,which is not conducive to the real-time processing of the model.In addition,when the noisy speech phase is used to reconstruct the enhanced speech in the post-processing stage,the enhanced speech has the problem of performance upper limit.In view of this,an end-to-end speech enhancement method combining non-local block and convolutional gated recurrent networks is proposed.By designing an encoder-decoder network,the time domain representation of speech signal is taken as the input of the encoding end,so as to fully learn the amplitude information and phase information of the speech signal;Non-local block is added to the stack to extract key features of the speech sequence while suppressing useless features,and introduce a gated recurrent unit network to capture the timing correlation information between the speech sequences.The experimental results show that the performance of this method is superior to that of the contrast method in the score of speech perception quality and short-term objective intelligibility.Finally,the two methods proposed in this paper are compared.The experimental results show that the end-to-end method achieves better performance in both network scale and enhancement effect under the conditions of invisible speaker and invisible noise,which further proves the effectiveness of the method.
Keywords/Search Tags:Speech enhancement, Dual-channel network, Attention module, End-to-End, Encoder-decoder network, Non-local block
PDF Full Text Request
Related items