Font Size: a A A

Research On Sound Source Recognition And Location Technology Based On Deep Learning

Posted on:2024-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhuFull Text:PDF
GTID:2568307136988269Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of intelligent voice applications,the demand for sound event location and detection(SELD)is increasing.SELD contains two main tasks: sound event detection and sound source localization.It can achieve both sound source category recognition and position estimation.With the continuous development of deep learning,in this paper,the SELD based on deep learning is studied.The main contributions are described as follows.(1)The basic principles of SELD are studied.The sound signal pre-processing techniques such as sample quantization,pre-emphasis,and frame-splitting plus windowing are described at first.For sound source identification,the feature extraction methods of Mel cepstrum coefficient(MFCC)features,Filter Bank(Fbank)features and Gammatone filter bank cepstrum coefficient(GFCC)features are studied.The identification classification algorithms are described at last.For sound source localization,some localization parameters,such as received signal strength,and time of arrival are introduced as first.And then traditional localization methods are described using the above parameters.According to the above theoretical knowledge,it provides a solid foundation for the following research.(2)A deep learning based single sound event location and detection algorithm is proposed.After the pre-processing of the source signal,the complete ensemble empirical mode decomposition with adaptive noise(CEEMDAN)algorithm is used for noise reduction.Then,the Fbank features of each channel and the GCC features between neighboring channels are extracted to form the final features by the fusion processing.At last,a multi-task learning framework based on convolutional recurrent neural networks is used for off-line training.An attention mechanism is added to improve the training efficiency.The multi-task learning between sound source recognition and position estimation can significantly improve the recognition and localization performance.The experimental results show that the recognition accuracy in single-source conditions can reach88.9%,and the localization error is within 1 meter.(3)A deep learning based multi sound event location and detection algorithm is proposed.After pre-processing and noise reduction of the source signals,the multiple source signals are separated by the DPRNN model at first.Then a Res Net-based voiceprint library is built for noise reduction.It can keep only the speaker audios of interest and achieve sound recognition.The Arc Voice loss function is proposed to increase the aggregation of similar classification results and the differences of different class results.Finally,the single sound event location algorithm is utilized for sound localization.The experimental results verify the efficiency of the proposed algorithm on a dual-source database.
Keywords/Search Tags:sound feature, sound recognition, sound source localization, convolutional recurrent neural network(CRNN), speech separation, deep learning
PDF Full Text Request
Related items