Font Size: a A A

Research On Multi-channel Speech Enhancement Technology Based On Beamforming And Time-frequency Masking

Posted on:2021-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ChenFull Text:PDF
GTID:2518306476450814Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement is an important part of front-end acoustic signal processing,which is an important means to improve speech quality,and the premise and foundation of subsequent speech tasks.However,there are various complex and changeable interferences in real life scenarios that seriously affect the quality of the transmitted speech.Therefore,how to improve the quality of noisy speech is a very challenging task.Compared with the traditional single-channel speech enhancement technology,the multi-channel speech enhancement technology can additionally use the spatial information of the speech,and to a certain extent,it is helpful to improve the quality of noisy speech in complex environments.Corresponding research is made on multi-channel speech enhancement technology based on beamforming and time-frequency masking.The main research contents are as follows:(1)The traditional microphone array signal processing techniques are studied.On this basis,the advantages and disadvantages of the classic beamforming algorithm for multi-channel speech enhancement and the commonly used post-filtering algorithm are analyzed.Finally,the subjective and objective evaluation in the existing speech quality evaluation are analyzed,and the PESQ and STOI indicators in the objective evaluation criteria are selected as the objective indicators for subsequent experimental analysis.(2)The time-frequency masking technology,recurrent neural unit and its main variants are studied,and a multi-channel speech enhancement algorithm combining time-frequency masking and recurrent neural network is proposed.Time-frequency masking technology provides a good target for our supervised learning.Compared with traditional neurons,recurrent neural units can make good use of historical information.More importantly,using a recurrent neural network to build a post-filtering algorithm can further improve the quality of the speech after the delay and sum beamforming.The proposed algorithm has verified its effectiveness and superiority on the synthesized data set and the recorded data set.(3)The basic composition structure of convolutional neural network and the theoretical basis of multitask learning are studied,and a multi-channel speech enhancement algorithm combining convolutional neural network and multitask learning is proposed.Convolutional neural networks have a strong ability to automatically learn the required features,and multitask learning is conducive to further improving the generalization ability of the model.More importantly,the use of convolutional neural networks can fuse fixed beamforming and post-filtering algorithm into a whole.The experimental results prove that the proposed multi-channel speech enhancement algorithm combining convolutional neural network and multitask learning is not only effective on the synthesized data set,but also effective in multi-channel speech recorded in the actual scene.
Keywords/Search Tags:Beamforming, Time-frequency masking, Recurrent neural network, Convolutional neural network, Multitask learning
PDF Full Text Request
Related items