Font Size: a A A

Research On Speech Dereverberation Based On Deep Learning Under Complex Environment

Posted on:2023-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:L Y QiangFull Text:PDF
GTID:2558307154474364Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,as speech interactive products have landed one after another,making people’s lives more convenient,but also contributing to the economic activities of society,so that speech processing technology has received more and more attention.However,the actual speech interaction system is often facing to complex acoustic scenes,including the original sound source and its reflection from various surfaces,in which the attenuation,delay and reflection of the original sound combine to form reverberation signal..When reverberation is severe,the clarity and intelligibility of speech will be impaired to a certain extent,and performance of the speech system would be damaged.Although traditional speech dereverberation method has been extensively studied for decades and made significant progress,most of them are based on statistical methods,and their strong assumptions will limit the improvement of speech dereverberation performance.At present,artificial intelligence is developing rapidly,and it is widely used in the field of speech dereverberation due to its use of data-driven forms and the powerful learning ability of deep neural networks.However,at the current research stage,the model still has some shortcomings,such as:1)The problem of strong constraints on the training target:In the deep learning-based speech dereverberation method,most of the speech dereverberation method is regarded as a regression problem,and deep neural network learns a filter function to remove reverberation by masking noise or mapping noisy features into clean speech features.However,during the training process,optimizing the neural network by using an objective function may be accompanied by strong assumptions that the variance of each frequency band is consistent.2)Mismatch between front-end and back-end systems:Speech dereverberation is not only an important part of speech enhancement,but also provides front-end preprocessing for downstream tasks such as speech recognition.Due to the complexity of the real environment,there will be differences between the training and testing environments,which will result in a significant reduction in the performance of downstream tasks.In response to the above problems,the main contributions of this article are as follows:(1)Aiming at the problem of strong constraints on training targets,this paper proposes a speech dereverberation algorithm based on scale-aware mean square error loss.This algorithm aims to alleviate the statistical assumption that the variance of each frequency band is the same in the traditional mean square error loss.It corrects the mean square error loss of different frequency bands by setting different scales,and gradually narrows the gap between the true distribution of low frequency and high frequency through progressive learning,thereby making the gap it basically conforms to the statistical assumptions of the traditional mean square error loss function,so as to enhance the overall optimization and convergence of the network,thereby improving the system performance.This algorithm has been verified in a number of experiments on the REVERB challenge data set,which proves the effectiveness and robustness of the algorithm.The specific performance is as follows:Compared with the traditional spectrogram mapping method,the speech-to-reverberation modulation energy ratio(SRMR)index was improved from 3.83 to 4.82,and the perceptual evaluation of speech quality(PESQ)index was raised from 2.54 to 2.62.(2)Aiming at the problem of the mismatch between the front-end and backend systems,this paper proposes a Generator-adaptation Network based speech dereverberation algorithm to break the state of no communication between the front-end enhancement system and the back-end recognition system.Inspired by the fact that the discriminator in the generative adversarial network can use a data-driven method to identify the quality of the enhanced speech,this algorithm replaces the original discriminator module by designing an acoustic model,and uses the discriminator in the recognition task to score and return information to the front end.The enhancement task of the generator module enables the frontend and back-end to communicate with each other,and automatically learns to generate robust features that conform to the enhancement task and the recognition task,thereby alleviating the mismatch problem of the front-end and back-end systems.The experiments of this algorithm on the REVERB challenge data set show that the designed system can effectively improve PESQ and SRMR.In addition,compared with the traditional speech dereverberation method based on generative adversarial model,under the condition of simulated data,the word error rate of speech recognition dropped from 9.89%to 8.05%.
Keywords/Search Tags:Speech de-reverberation, mean square error loss function, scale aware, generative adversarial network, adaptor
PDF Full Text Request
Related items