Font Size: a A A

Design And Implementation Of Online Video Communication System Based On Audio-Visual Speech Enhancement

Posted on:2024-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:J N YinFull Text:PDF
GTID:2568306941489754Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,under the trend of economic globalization,with the widespread cooperation and telecommuting in different places and the comprehensive coverage of high-speed mobile networks,online video communication is gradually becoming an irreplaceable part of people’s work and life.However,traditional speech enhancement algorithms widely used in online video communication systems suffer from inadequate speech noise reduction performance,and the shortcomings of such systems in terms of user experience in noisy environments are becoming increasingly apparent.Under this background,by analyzing the advantages and disadvantages of various speech enhancement algorithms,this paper studies the application of deep learning-based audio-visual speech enhancement method in video communication system,so as to make full use of the visual information carried by video communication and further improve the speech enhancement performance of traditional WebRTC communication system.The topic chosen for the paper has both theoretical research and practical application significance.First,this paper discusses the shortcomings of existing audiovisual speech enhancement methods applied to video communication scenarios.To address the instability of visual features in video communication,this paper introduces lip sync methods as visual features and designs an automatic feature switching mechanism.In addition,this paper further improves the switching mechanism by introducing a non-intrusive speech quality evaluation method.Tested in various scenarios,the proposed method in this paper shows better speech enhancement performance than the existing algorithm.Secondly,the implementation of the existing audio-visual speech enhancement algorithm is mainly oriented to offline scenarios and cannot directly adapt to real-time audio processing needs.This paper designs and implements an online audio-visual speech enhancement media processing pipeline based on the GStreamer media processing framework to adapt to streaming transmission,which ensures the efficient performance of audiovisual speech enhancement algorithms in video communication systems.Finally,based on the proposed algorithm and processing pipeline,this paper designs and implements an online video communication system.The system is compatible with the WebRTC communication protocol and introduces the proposed online audio-visual speech enhancement processing pipeline through the SFU architecture to provide an audiovisual speech enhancement technology-enhanced noise reduction experience.Tests show that the system has significantly improved the speech enhancement performance compared with the native WebRTC communication system and meets the real-time requirements in terms of processing delay and other indicators,demonstrating the feasibility of the application of audio-visual speech enhancement technique in video communication and the research value of the system in enhancing speech quality and improving user experience.
Keywords/Search Tags:audio-visual speech enhancement, media process pipeline, streaming technology, video communication
PDF Full Text Request
Related items