Font Size: a A A

Research On Multi-channel Speech Enhancement Technology Based On Graph Signal Processing

Posted on:2024-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:P C ZhangFull Text:PDF
GTID:2568307136493224Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In the face of the surging heterogeneous and multi-source massive data in the era of big data,Graph Signal Processing(GSP),as an emerging data processing technology,uses the relationship between signal samples to construct graph signal for modeling the data structure.GSP can obtain better data processing performance,which has attracted wide attention of researchers.Traditional digital signal processing(DSP)only considers the relationship between the current sampling points.However,GSP describes the potential relationship between vertices by defining the weight matrix and fully taking into account the influence of sample values between the current vertex and adjacent vertices,which makes the use of information more sufficient.GSP mainly studies the theory and method of graph signal representation and analysis.It reveals the relationship between signals by means of graphs,extends the traditional DSP theory to irregular graph signals,and provides an effective means for processing complex data.It has been widely used in image processing,biomedicine,machine learning,speech signal processing,wireless sensor networks and other fields.In the field of speech signal processing,microphone array has added spatial domain in addition to time domain and frequency domain dimension,which can effectively use the spatial information of microphone array,and has higher potential speech quality.In addition,there is a general relationship between the sampling points of the speech signal itself.Using the spatial information of the microphone array and the topological relationship between the speech samples,it is easy to solve the problems such as sound source location,sound source separation,dereverberation and sound source tracking.In complex environments,the multi-channel speech enhancement algorithm based on microphone array has better performance,and can be applied to scenes with high requirements for voice communication quality,such as mobile communication,hearing aid devices,venue environment,etc.In view of this,this thesis attempts to study the multi-channel speech enhancement technology based on graph signal processing to improve the noise suppression performance and the robustness of the algorithm.The main work and innovations of this thesis are as follows:(1)This thesis proposes a novel graph post-filtering(GPF)method to enhance multi-channel speech signals by combining graph signal processing and beamforming.Firstly,this thesis designs a multi-order self-spin graph topology to construct speech graph signals.Then,considering the limitations of classical wiener post-filtering(WPF)method in complex scattered noise field,this thesis proposes a novel graph post-filtering method GPF by combining beamforming.Specifically,based on the autocorrelation and cross-correlation power spectrum density of the input speech signal,the gain function is derived to predict the graph spectrum of the source speech signal.Experimental results show that the proposed GPF method outperforms the traditional WPF method in terms of both signal-to-noise Ratio(SNR)and Perceptual Evaluation of Speech Quality(PESQ).Additionally,experimental results also show that the delay compensation deviation of each channel will affect the performance of GPF based multi-channel speech enhancement.(2)Considering that the spatial relationship between channels affects the noise reduction,graph signal processing can capture the potential relationship.If the spatial physical distribution map is directly used,its time-varying characteristics cannot be reflected in real time.Therefore,this thesis proposes a multi-channel speech enhancement method based on joint graph learning.Firstly,this thesis proposes a joint time-space graph learning method,which jointly optimizes the array space graph and the speech frame inner graph,for the sake of minimizing the sum of the smoothness of the multi-channel noisy speech signal on the spatial graph,the smoothness of the nosiy speech signal from the reference channel on the speech frame graph,the sparsity of the Laplace matrix and the sparsity of the adjacency matrix.Based on the learned space graph and frame inner graph,the time-space joint graph of multi-channel speech signal is constructed.On this basis,the multi-channel speech graph signal is enhanced by applying the joint graph fourier transform and the fixed beam forming(FBF)method.Experimental results show that the proposed joint graph learning based FBF(JGL-FBF)method can significantly improve both SNR and PESQ compared with the traditional FBF method.In addition,the experimental results also show that the accuracy of delay compensation affects the speech enhancement performance of JGL-FBF.
Keywords/Search Tags:Graph Signal Processing, Graph Topological Structure, Post Filtering, Joint Graph Learning, Multi-channel Speech Enhancement, Beamforming, Delay Compensation
PDF Full Text Request
Related items