| I In recent years,with the rapid development of smart devices,sound source localization has gained widespread attention as one of the fundamental applications of intelligent devices.The rapid advancement of deep learning has further improved the accuracy and robustness of sound source localization algorithms based on deep neural networks,gradually replacing the use of traditional sound source localization algorithms in real-world applications.To address the issues of hidden Direction of Arrival(DOA)information loss and small sample data in sound source localization,this paper proposes a method for estimating the sound source arrival direction based on Gated Recurrent Unit(GRU)and self-attention networks.This method employs GRU as the backbone network,which performs well on small datasets,thereby mitigating the challenges of collecting pure sound data.Additionally,the method utilizes multi-channel recordings of sound sources to form the training set.After applying short-time Fourier transform feature extraction,it obtains the Mel spectrogram and acoustic intensity vectors.These are combined with the multi-channel cepstral maps and normalized principal feature vectors to create input features.This approach avoids the loss of implicit DOA information caused by the combination of cepstral maps and GCC-PHAT features,effectively alleviating the problem of hidden DOA information loss.The input features are then fed into a convolutional recurrent neural network model for supervised learning to obtain model parameters.The model output employs three-dimensional Cartesian coordinate regression to estimate the DOA positions.The addition of self-attention networks enables parameter backpropagation during model training,allowing the network to compute the loss and predict the correlation matrix simultaneously,thereby resolving the optimal allocation between predicted and reference positions.Experimental results demonstrate that the proposed network exhibits high localization accuracy and robustness under different reverberation conditions and signal-to-noise ratios.To address the pressing need for simultaneous localization of multiple sound sources in real-world applications,this paper proposes a sound source localization model.The model extensively analyzes the application of trajectory output and the training technique called Duplicate Permutation-Invariant Training(DPIT)and integrates them with existing model frameworks to achieve multi-source localization.For the practical implementation of deep learning-based sound source localization systems,the ability of the network model to locate sound sources accurately in complex environments is a crucial issue.The trajectory output allows the model to assign different events occurring within the same frame to separate trajectory channels for simultaneous localization.Additionally,by incorporating DPIT,the model selects the permutation with the minimum loss among different classes of sound sources and trajectory arrangements to ensure the accuracy and feasibility of sound source localization.This paper applies the trajectory output and DPIT to the proposed model and conducts experiments,successfully achieving multi-source localization with a certain level of accuracy and robustness. |