Research On Sound Source Recognition And Location Technology Based On Deep Learning

Posted on:2024-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:H X Zhu

Full Text:PDF

GTID:2568307136988269

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the rapid development of intelligent voice applications,the demand for sound event location and detection(SELD)is increasing.SELD contains two main tasks: sound event detection and sound source localization.It can achieve both sound source category recognition and position estimation.With the continuous development of deep learning,in this paper,the SELD based on deep learning is studied.The main contributions are described as follows.(1)The basic principles of SELD are studied.The sound signal pre-processing techniques such as sample quantization,pre-emphasis,and frame-splitting plus windowing are described at first.For sound source identification,the feature extraction methods of Mel cepstrum coefficient(MFCC)features,Filter Bank(Fbank)features and Gammatone filter bank cepstrum coefficient(GFCC)features are studied.The identification classification algorithms are described at last.For sound source localization,some localization parameters,such as received signal strength,and time of arrival are introduced as first.And then traditional localization methods are described using the above parameters.According to the above theoretical knowledge,it provides a solid foundation for the following research.(2)A deep learning based single sound event location and detection algorithm is proposed.After the pre-processing of the source signal,the complete ensemble empirical mode decomposition with adaptive noise(CEEMDAN)algorithm is used for noise reduction.Then,the Fbank features of each channel and the GCC features between neighboring channels are extracted to form the final features by the fusion processing.At last,a multi-task learning framework based on convolutional recurrent neural networks is used for off-line training.An attention mechanism is added to improve the training efficiency.The multi-task learning between sound source recognition and position estimation can significantly improve the recognition and localization performance.The experimental results show that the recognition accuracy in single-source conditions can reach88.9%,and the localization error is within 1 meter.(3)A deep learning based multi sound event location and detection algorithm is proposed.After pre-processing and noise reduction of the source signals,the multiple source signals are separated by the DPRNN model at first.Then a Res Net-based voiceprint library is built for noise reduction.It can keep only the speaker audios of interest and achieve sound recognition.The Arc Voice loss function is proposed to increase the aggregation of similar classification results and the differences of different class results.Finally,the single sound event location algorithm is utilized for sound localization.The experimental results verify the efficiency of the proposed algorithm on a dual-source database.

Keywords/Search Tags:

sound feature, sound recognition, sound source localization, convolutional recurrent neural network(CRNN), speech separation, deep learning

PDF Full Text Request

Related items

1	Sound Source Localization And Recognition From Complex Background
2	Research On Indoor Sound Source Localization Algorithm Based On Deep Learning
3	Research And Implementation Of Blind Source Separation Algorithm For Sound Source Localization
4	Research On Environmental Sound Recognition Method Based On Deep Learning
5	Research On Multi-Sound Source Intelligent Recognition Method Based On MFF-ResNet
6	Multiple Sound Source Localization Method And Its Application In Vehicle Honking Monitoring Systems
7	Neighbourhood Similarity Augmentation On Multi-source Sound Event Detection And Localization
8	Research On Multi-sound Event Localization And Detection Method Based On Deep Learning
9	Feature Extraction And Recognition For Sound Source Localization Using A Small-Sized Microphone Array
10	Technology Based On Convolutional Recurrent Research On Sound Source Localization Neural Network