| Speech is the most convenient and common way of communication in people’s daily life,but the speech signal is often destroyed by various kinds of noise.Then speech enhancement technology is needed to remove the noise and improve the speech quality.There are still several problems in the current research on speech enhancement.First,the traditional spectral subtraction method has the advantages of small computation and high real-time performance,but there are problems of "musical noise" and speech distortion.Second,the methods based on deep neural networks can achieve better performance,but the strict requirements on data set,computing power and storage limit its wide application.Third,most speech enhancement methods only consider amplitude features,which lose the phase information that affects the speech quality.How to fully learn the speech features so that the system can still achieve good enhancement performance in low SNR environment is still a problem.To address the above problems,two single-channel speech enhancement methods based on deep neural networks and time-frequency masking estimation are proposed in this paper,and the main work is as follows.First,to address the problems of spectral subtraction,this thesis proposes a two-stage speech enhancement method and design an effective deep neural network to remove "musical noise" and improve speech quality.The method consists of a first stage of noise reduction using the Berouti spectral subtraction and a second stage of enhancement using a deep neural network.The results on the Voice Bank + DEMAND dataset show that the proposed method has up to 0.88 improvement in perceptual evaluation of speech quality(PESQ)and 6.66 d B SNR improvement compared to the spectral subtraction method,and also obtains better metric results compared to other methods,proving that the method can achieve better enhancement performance.Second,this thesis proposes a dual-path network combining amplitude and complex features to address the problems of phase information being ignored and deep learning-based speech enhancement methods.This work has three main contributions.First,this thesis proposes a dual-path structure that can model both amplitude and complex features,combining the advantages of both features to achieve optimal spectrum estimation.Second,this thesis introduces an Attentional Feature Fusion module to fuse the two features and promote overall spectrum recovery.Third,the Transformer is improved as a feature extraction module to efficiently extract local and global features.Results on the same dataset show that the dual-path network achieves better performance than baseline models with fewer model parameters,and has a maximum PESQ improvement of 0.38 and a SNR improvement of 5.36 d B.In addition,the results of the ablation experiments demonstrate the effectiveness of the dual-path structure,the improved Transformer module and the fusion module,and the results of the extended experiments based on the masking methods show better speech quality improvement using the “multiply-thendecode” strategy. |