Font Size: a A A

Study On Speech Separation Technology In Vehicle Environment

Posted on:2022-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:2492306338978479Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,the extensive use of on-board voice interaction devices has replaced the manual control mode of the original electronic on-board devices,which to a large extent improves the driver’s concentration and ensures the safety of driving.However,due to the complex and changeable on-board environment in the process of driving,the recognition accuracy of voice interactive instructions is reduced,which seriously affects the experience of voice interactive process.Therefore,it is very important to suppress the interference of noise and to separate the speech of the target speaker from the complex driving environment,which has become a research hotspot.The main work of this thesis is to study and realize the speech separation algorithms of single channel and multi-channel in the vehicle-mounted environment.First of all,this thesis describes the basic theory of speech separation and several main speech separation algorithms,and introduces the basic theory of microphone array and performance evaluation criteria of speech separation.Secondly,when the frequency domain algorithm is used to extract the speech features,the receptive field of the speech features extracted by the Convolutional Neural Network is small,and the extracted features only contain local information.Therefore,this thesis presents a single channel speech separation algorithm combined with the convolutional attention mechanism.The main idea of the algorithm is to carry out Short-time Fourier transform on the speech signal and the amplitude spectrum information and phase spectrum information obtained by the transformation are input into the double-current module for processing.The convolutional attention mechanism is used to extract the global features of speech signals from different dimensions.Finally,the amplitude spectrum features are input into the GRU network for training.The target speech is obtained by combining the enhanced amplitude spectrum and phase characteristics.The experimental results show that compared with LSTM,the number of network parameters is greatly reduced.In the on-board environment,the improved algorithm achieves better performance at high signal-to-noise ratio,and the quality and intelligibility of speech are improved to a certain extent.Under the condition of unmatched noise,the robustness of the algorithm is good.Finally,considering the error of the frequency domain algorithm when the STFT is used to extract information,the thesis improves the time-domain based Wave-U-Net network to enhance the performance of the algorithm.Firstly,the attention mechanism is combined with Wave-U-Net network to reduce the semantic gap caused by the connection between low-level features learned by shallow modules and high-level features learned by deep modules.In view of the fact that the mean square error loss function cannot handle the outlier value problem well,this thesis uses the mean absolute error loss function to ensure the convergence speed and improve the robustness of outliers.The experimental results also verify the effectiveness of the algorithm.
Keywords/Search Tags:speech separation, vehicle environment, attention mechanism, GRU network, Wave-U-Net
PDF Full Text Request
Related items