Design And Implementation Of Voice Enhancement System In Cloud Video Conference

Posted on:2023-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Feng

Full Text:PDF

GTID:2568307058499484

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Due to the characteristics that cloud video conferencing can span time and space,it has been increasingly applied to social practice in recent years.However,in the transmission process between the cloud video conference terminal and the server,the signal is easily mixed with noise and interfered.As an important information transmission carrier for cloud video conferences,the quality and clarity of sound signals are greatly affected.The loss will seriously affect the artificial intelligence applications of cloud video conferencing,such as smart subtitles,smart minutes and other functions.Therefore,a real-time denoising algorithm that can well serve back-end voice applications is crucial to ensure the better development of cloud video conferences.This sound signal denoising algorithm is also called speech enhancement.Speech enhancement generally refers to removing the noise interference contained in the speech signal mixed with noise by certain technical means,so that the noisy speech signal is closer to the clean sound signal.Traditional speech enhancement algorithms generally make some assumptions about the composition of noisecontaining speech,and then traditional algorithms remove noise based on mathematical assumptions.This assumption-based algorithm not only cannot effectively deal with the nonstationary noise mixed in the sound signal,It is also possible that new noise is further mixed into the sound signal to make the back-end speech recognition application less effective,and traditional speech enhancement algorithms have been unable to adapt to the application of contemporary cloud video conferencing.Since the well-known multilayer perceptron was proposed,deep learning has been greatly developed and applied to various fields to deal with related problems.Deep learning models are able to capture the deep features of the sound signal to learn the unique properties of the noise-free speech signal.Compared with the traditional speech enhancement algorithm,the deep learning algorithm does not need to make assumptions about the speech containing noise,and is more suitable for the current living environment with complex noise.Based on the advantages of deep learning,this thesis proposes DSAGAN speech enhancement algorithm research that meets the needs of cloud video conference.The main research work of this thesis is as follows:(1)This thesis designs a speech enhancement model for cloud video conferencing based on generative adversarial networks.The generator uses the encoder module for feature extraction of noisy speech,and is able to estimate the feature code of the noisy speech signal as a clean speech signal through the decoder module.By adding a dense connection module to the generator structure,the attention module and other structures effectively improve the PESQ and STOI values of the estimated speech.(2)This thesis studies the method of improving the voice application of the back-end voice enhancement network service.Referring to the principle of the DNN-HMM algorithm,the discriminator of the generative adversarial network with phoneme loss is designed,so that the enhanced voice can retain more voice information,WER value dropped significantly.(3)Based on the self-attention mechanism that has achieved excellent results in many fields,the problem of real-time computing is further considered,and a sparse attention structure that saves more computing resources and time is designed.In the process of attention calculation,the weight vector is grouped by a clustering algorithm,which greatly reduces the computational complexity of attention.(4)The voice enhancement model can obtain better results by increasing the volume of the network,but the huge model will affect the real-time performance in the cloud video conference.This thesis designs a method for pruning and quantification after the training of the voice enhancement model is completed.The experiment compares the impact of fixed-point quantization and unstructured pruning on the speech enhancement model,and determines the acceleration method that is more suitable for the speech enhancement model.(5)This thesis designs and implements the architecture and functions of the voice enhancement system based on the HUAWEI CLOUD video conferencing system.

Keywords/Search Tags:

speech enhancement, attention mechanism, generative adversarial network, model acceleration

PDF Full Text Request

Related items

1	Speech Enhancement Of Deep Neural Networks Combined With Attention Mechanism
2	Research On Speech Enhancement Methods Based On Generative Adversarial Networks
3	An End-to-end Bone-conducted Speech Enhancement Method Based On Generative Adversarial Networks
4	Research On Single-Channel Speech Enhancement Based On Generative Adversarial Network
5	Research On Auto-encoders And Generative Adversarial Network Based Speech Enhancement
6	Single Channel Speech Enhancement Based On Generative Adversarial Networks
7	Speech Enhancement Algorithm Based On Generative Adversarial Network
8	Research On Speech Enhancement Model Based On Improved Generative Adversarial Networks
9	Research On Speech Enhancement Method Based On Generative Adversarial Networks
10	Research On Underwater Image Enhancement Method Based On Generative Adversarial Networ