| Monaural speech separation plays an important role in speech recognition,multimedia retrieval,telephone speech recognition and other scenarios in noisy environments.Compared with multi-channel speech separation,monaural can only use limited information,which can only be based on the speech signal itself.The feature completes the separation task,which is more challenging and difficult for speech separation.Computational auditory scene analysis uses computers to simulate the process of human auditory system perceiving external sounds,and combines human auditory mechanism to analyze and separate mixed speech,which has important research significance.In this dissertation,based on auditory scene analysis,monaural speech separation methods are studied.The main research contents are as follows:(1)Voiced sound separation method.In this dissertation,Fourier transform is used to convert the mixed speech into the time-frequency domain.According to the harmonic structure characteristics of the voiced sound,the frequency spectrum of each harmonic is extracted by the comb filter,and the voiced sound segments are combined simultaneously.The separated voiced segments are reconstructed using inverse Fourier transform and splicing addition.(2)Robust pitch estimation method.In the dissertation,the pitch of each frame of speech is estimated by the cepstrum method,and the pitch spectrogram is drawn by using the continuity of the pitch to estimate the pitch trajectory of single-person and double-person speech.Aiming at the problem that the pitch trajectory in the mixed speech weakens or even disappears when the signal-to-noise ratio is low,the dissertation improves the traditional cepstrum calculation method.The core idea is to perform a positive half cycle on the trigonometric function of the traditional cepstrum calculation process and the inner product of the spectrum.The experimental results show that the improved cepstrum algorithm can enhance the weakened pitch trajectory in the mixed speech and reproduce the partially disappeared pitch trajectory,which overcomes the disadvantage of the traditional cepstrum method which is not strong against noise.(3)A mixed speech separation method for single speech and noise.In order to improve the accuracy of harmonic extraction and the quality of separated speech when the signal-to-noise ratio is low,the comb filter is improved in this dissertation.Using the characteristics of spectral harmonics,the low-order harmonic frequency and the fundamental frequency are iterated successively to obtain high-order harmonics.Accurate estimation of sub-harmonic frequency samples,while retaining only the extremum points of the comb filter to reduce "crosstalk" interference.The signal-to-noise separation experiment shows that even under the condition of extremely low signal-to-noise ratio(-10 d B),as long as the harmonic spectrum can be accurately extracted,the target speech can be separated.(4)Separation method of two-person mixed voices.Aiming at the timing combination of voiced segments,considering the short-term nature of voiced segments and the timing of voice features,this dissertation uses a speaker recognition model based on Bi-GRU to combine the voiced segments after two-speaker separation.Aiming at the phenomenon of "crosstalk" in the separated speech,this article analyzes its causes,and eliminates the "crosstalk" by smoothing the abnormally strong amplitude and introducing the Griffin & Lim iterative algorithm. |