| In many speech communication application scenarios,such as mobile communication,hearing aids and automatic speech control systems,the recorded and transmitted speech signals are often disturbed by a large amount of acoustic background noise,which can significantly degrade the intelligibility of the speech.Therefore,effective noise suppression technology is very important for the development of speech signal processing.The technology of extracting the desired clean speech signal from the noisy background environment is usually called speech enhancement algorithm while speech enhancement technology based on microphone array usually called multi-channel speech enhancement.Speech enhancement use signal processing technology to enhance the speech polluted by background noise.With the development of deep learning,speech enhancement using deep learning algorithm has gradually become a key research direction.The core architecture in this paper mainly includes the following aspects:(1)First,the development history and research status of speech enhancement is introduced,and the purpose of this study is revealed.Secondly,some basic knowledge of traditional singlechannel speech enhancement algorithm is briefly summarized which further leads to the basic knowledge of microphone array.Then,the traditional multi-channel speech enhancement algorithms are introduced in detail,including fixed beamforming algorithms and adaptive beamforming algorithms.Among them,Common adaptive beamforming methods include MVDR and GSC algorithms.Finally,this paper also introduces the commonly used subjective evaluation and objective evaluation of speech quality,which lays the foundation for subsequent experimental work.(2)A multi-channel speech enhancement algorithm combining beamforming and U-Net is proposed,and the complex ideal mask(c IRM)is introduced as the training target of the model.The loss function uses the joint loss of Mean Squared Error(MSE)and Weighted Source Distortion Ratio(w SDR)loss.The experimental results show that the use of deep learning as the post-filter for beamforming is effective.Finally,on this basis,this paper abandons the prebeamformer,and proposes an end-to-end deep learning model,a neural beamforming algorithm based on attention mechanism.The network simulates the role of beamforming in the deep features through the attention mechanism to improve the performance of the model.The model is trained and validated on the VOi CES dataset.(3)The limitations of the fully convolutional network based on U-Net are revealed: on the one hand,the respective field of CNN is limited,and the speech signal is a time series with strong time correlation.On the other hand,U-Net needs to downsample the time dimension in the encoder stage,which makes it a non-causal system.In response to the above problems,this paper introduces the basic knowledge of recurrent neural networks,focuses on the most widely used LSTM network.According to the limitations of LSTM,the variant,Conv-LSTM is introduced.Subsequently,the Convolutional Recurrent Network(CRN)structure is illustrated,which is commonly used in deep learning speech enhancement.To overcome the drawback of LSTM,Conv-LSTM is used to replace the LSTM block in CRN,leading to the multi-channel speech enhancement algorithm based on improved CRN proposed in this paper.The model mainly includes two parts: the pre-beamforming network and the post-filtering network are used.In the pre-beamforming network,the simulation beamforming is applied based on the attention layer and the Conv-LSTM layer to enhance the input features of each channel.The post-filtering network is composed of the improved CRN.The weighted features are enhanced to output the final estimated c IRM.The performance of the algorithm on the VOi CES dataset is further improved,and its superiority is also revealed compared to other deep learning model on the CHIME-3 dataset. |