Font Size: a A A

Research On Monaural Speech Enhancement Algorithm Based On Critical Frequency Band And Attention Mechanism

Posted on:2024-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ZhaoFull Text:PDF
GTID:2568306932455984Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech is a tool for human beings to communicate with the outside world and plays an important role in daily life.However,speech in a noisy environment is usually interfered by various noises,which reduces the clarity and intelligibility of the speech signal.Speech enhancement technology can restore clean speech from the speech corrupted by noise through certain methods,so as to improve the quality of speech signal and improve the auditory perception of human beings.Traditional speech enhancement methods based on digital signal processing were the main research content of previous researchers.This kind of method has a complete theoretical basis and is interpretable,and can effectively recover clean speech from stationary noise environments,but not suitable for non-stationary noise environments.With the vigorous development of deep learning in recent years,speech enhancement methods based on neural networks have become a hotspot in the field of speech signal processing.These methods can deal with complex non-stationary noise and achieve better results than traditional methods in speech denoising tasks.However,most of the current speech enhancement methods based on neural networks directly apply the current popular neural network structure to construct speech enhancement models but do not consider the inherent characteristics of human hearing itself.This article proposes to apply the method of dividing the critical bands of human auditory perception to neural networks and designs a single-channel speech enhancement method that conforms to the auditory characteristics of the human ear.Based on this,this paper proposes the following two research points:1 A single-channel speech enhancement network that focuses on sub-band information extraction,namely,the parallel subband complex convolutional recurrent network with Bark attention mechanism(PSCCRN-BAM).The main idea is to simulate the critical frequency bands of the human ear to divide the full-band speech signal into frequency sub-bands.Each sub-band uses a separate complex convolutional recurrent module for encoding and decoding,and the proposed lightweight Bark attention mechanism(BAM)effectively weights and corrects the transmission of subsequent sub-band information by generating adaptive attention weights.Based on PSCCRN-BAM,the non-causal parallel subband transform neural network with Bark attention mechanism(PSTNN-BAM)is also proposed,which uses the ATFAT module that can perform simultaneous time-frequency context analysis and has stronger context analysis capabilities to replace the LSTM module in PSCCRN-BAM.Compared with PSCCRN-BAM,PSTNN-BAM has more complex structure and better speech enhancement performance.Experimental results show that PSCCRN-BAM and PSTNN-BAM achieve better speech enhancement performance than other advanced methods.2 A novel speech enhancement model,called the Interactive Dual-branch monaural Speech Enhancement Model based on Critical Frequency Bands(IDBM-CFB),which applies the subband idea to a dual-branch network model.On one hand,the signal on the complex spectrum branch is subbanded to focus on learning internal information in each subband.On the other hand,a full-band convolutional recurrent network is constructed on the amplitude compensation branch to focus on learning global information.Furthermore,there are specific modules responsible for transferring the subband information that has been reduced and fused on the complex spectrum branch to the amplitude compensation branch for further learning,in order to obtain better model performance.Experimental results show that IDBM-CFB can achieve better results than other state-of-the-art methods on most speech evaluation metrics with fewer parameters.
Keywords/Search Tags:single channel speech enhancement, neural network, critical frequency band, dual branch
PDF Full Text Request
Related items