Font Size: a A A

Real-time Communication Conference Cystem Based On Deep Learning Speech Noise Suppression And Speech Recognition

Posted on:2023-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:K DingFull Text:PDF
GTID:2558307103494104Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At the beginning of 2020,the outbreak of the novel coronavirus,in addition to threatening people’s health and safety,also had a serious impact on people’s lives,economy,education and other aspects at different levels.The sudden and huge change has given online communication a new meaning to people in work and life,and online communication has increasingly become a daily communication method.Due to different conference scenarios,the voice signal is easily interfered by environmental noise,which will seriously affect the quality of the voice signal and the intelligibility of the voice.At the same time,the real-time voice transcription of conference content plays an important role in improving conference efficiency,but it faces huge challenges in its application.Because the recognition accuracy of the streaming speech recognition model for long speech is poor,and the recognition delay of the non-streaming speech recognition model is too high,how to balance the relationship between the two is also a key issue of current research in the field of speech recognition.Based on these problems,this paper designs and implements a real-time communication conference system based on speech noise suppression and speech recognition,applies a speech noise suppression algorithm based on deep learning,provides high-quality,low-latency speech services for the conference system.Use a multi-level speech recognition model to transcribe the meeting content in real time.The specific work of this paper is as follows:(1)In view of the problem that complex speech denoising models cannot be applied to real-time conferences,this paper proposes and applies a speech noise reduction algorithm based on MFCC-GRU,uses MFCC features as model input,and obtains the corresponding gain factor through the GRU model.Finally,use the gain factor to initialize the bandpass filter for noise suppression.Experiments show that the noise suppression algorithm proposed in this paper can effectively enhance the speech signal input by the user,suppress the noise in the conference environment,reduce the impact of noise on the intelligibility of speech content,and reduce the delay caused by noise reduction.(2)In view of the problems that the speech recognition model has a large amount of computation,which leads to a slow recognition speed,and a low model generalization ability,which leads to a low recognition rate,this paper proposes an end-to-end speech recognition algorithm based on improved Conformer and CTC/Attention joint decoding.The probabilistic sparse self-attention extraction and the low-rank feedforward module extract important speech frames to optimize the computational complexity of the model.Experiments show that the improved Conformer and joint decoding algorithm proposed in this paper can effectively reduce the delay of streaming speech recognition and keep the recognition accuracy unchanged.(3)Based on the speech noise suppression and speech recognition algorithms,a real-time communication conference system is designed and implemented,using WebRTC to realize multi-person audio and video connection to microphone,applying speech noise suppression and speech recognition algorithms.In addition,other necessary front-end interaction modules of the conference system are realized,including chat rooms,PPT,sketchpad editing,login authentication,and device detection functions.
Keywords/Search Tags:Real-time communication, noise suppression, speech recognition, WebRTC
PDF Full Text Request
Related items