Font Size: a A A

Research On Multi-speaker Speech Separation Method Based On DOA Estimation And Frequency Domain Features

Posted on:2024-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:J L ChengFull Text:PDF
GTID:2568307157981459Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
In recent years,with the development of artificial intelligence technology and the rapid popularization of mobile Internet technology,voice separation technology has been making breakthroughs in the application of different scenarios,based on voice translation and video conferencing and the development of various application areas have also put forward higher requirements for it.In practical scenarios,speech signals are often disturbed by noise and other signals during propagation,which deteriorates the quality of speech signals and destroys the feature information carried by speech signals.In the real-world situation,there is often more than one speaker,and it has become common for multiple speakers to speak simultaneously.When there is no clear understanding of the number of sound sources and the amount of prior knowledge contained in the audio data,these tasks are not supposed to be built up.It will make it tougher to implement multi-speaker speech separation tasks in such situations.To address these challenges,this manuscript further starts and explores the research on the multi-speaker speech separation method based on Direction of Arrival(DOA)estimation.It is considered to be the baseline method.The main contributions are as follows:(1)A novel multi-speaker speech separation method is proposed.It combines the preprocessing module of a state-of-the-art and adaptive Wiener filter with the baseline method to address the issue of irregular noise and reverberation interference in the received speech information,which affects the effectiveness of speech separation.Compared to the baseline method,which focuses on separating the audio data after improving the quality of the speech signal,the original audio data combined with a generalized sidelobe canceller is going to be processed.An adaptive Wiener filter is also adopted to remove background noise and reverberation signals in the audio segment.Thereby a high-quality target speech signal will be obtained.Then,Multiple Signal Classification(MUSIC)algorithm is used to obtain DOA estimates from the preprocessed audio data.Finally,a multi-speaker speech separation method based on DOA estimation is implemented.Experiments are conducted on 16 audio segments in AMI corpus.The experimental results have been vividly illustrated that compared with the baseline method,the average hit rate(rHIT)of the proposed method is increased by 2.07%.The average multiple hit rate(rM H)and the average false alarm rate(rFA)is decreased by 2.19%and 8.1%,respectively.(2)The proposed DOA multi-speaker speech separation method based on the improved and adaptive Wiener filter can further improve the separation performance in complex environments.The key point on the issue of not obtaining the DOA feature values of each speaker in the presence of adjacent scenarios accurately is supposed to be solved.This may lead to poor separation performances when it is not solved.A multi-speaker speech separation method combining frequency domain features and DOA features is proposed.The fundamental frequency(0F)is expected to be obtained and the harmonic information of all the speakers is mixed with noises in the audio data.Then,the fundamental frequency and its harmonic information based on DOA estimations are adopted to build up new observation combinations.Finally,the observation combinations are predicted and updated by Kalman filter.The multi-hypothesis tracking method is designed and implemented to solve the uncertainty and generates the hypothesis trajectory to realize multi-speaker speech separation.Experiments on 16 audio segments in AMI corpus show that according to the DOA multi-speaker speech separation method based on improved adaptive Wiener filter,the average hit rate(rHI T)is increased by 3.92%,the average multiple hit rate(rMH)and the average false alarm rate(rFA)are decreased by 2.22%and 6.96%,respectively.
Keywords/Search Tags:Speech separation, Direction of arrival, Frequency domain feature, Generalized sidelobe canceller, Adaptive Wiener filter, Kalman filter, Multi-hypothesis tracking
PDF Full Text Request
Related items