| Sound event detection and localization technology have broad application prospects in the fields of environmental monitoring,smart homes,intelligent transportation,and so on.But the accuracy of event detection and location was worse by reverberation and non-stationary noise interference in a complex environment.We propose a design of sound feature extraction,sound event detection,and sound source localization under complex environment.The main work and contribution of this dissertation can be summarized as follows.(1)Research on sound feature extraction methods.We proposed a sound event detection method based on Variational Mode Decomposition(VMD)and Gammatone Frequency Coefficient Feature(GFCV),which solves the problem of low detection accuracy in sound signals caused by non-stationary noise interference.We proposed sound source localization features based on intensity difference(GCVV)for FOA format audio and phase difference(GCVP)for MIC format audio,which solves sound source localization in the presence of overlapping sound sources.In a real-world noise dataset,the GFCV features outperformed the Log-mel features,with an F1-score improvement of 0.86%,1.14%,1.16%,and 1.17%.In the dataset with a maximum sound overlap of 3,the GCVV and GCVP features reduced the localization error by 1.3 ° and 1.1 °,respectively.Compared to intensity vector(Ⅳ)and generalized cross-correlation phase transform(GCC-PHAT)features and increased the localization recall rate by 2.9%and 2.5%,respectively.(2)Research on sound event detection methods.We proposed a weakly labeled semi-supervised sound event detection method based on Convolutional Recurrent Neural Networks(CRNN)named CIRAS,which solves the problem of decreased accuracy in sound event detection due to non-stationary noise in real-world environments and the high labor cost of the strong label(including sound event category,start time,and offset time annotations)for training sound data.Firstly,we constructed a gated convolutional bidirectional independent recurrent neural network(GCBIndRNN)that utilizes residual connections between RNN layers to increase the depth of the CRNN,thereby improving the performance of sound event detection by better fitting the model during training.Secondly,effective channel attention(ECA)was introduced into the GCBIndRNN network to enhance its focus on sound event features under background noise.Finally,we established a semi-supervised model with sample relationship consistency and mean teacher(SRC-MT)to effectively train the neural network using weakly labeled sound data(containing sound event categories)and unlabeled sound data input,producing strong labeled sound event detection results as output.Experimental results on a real-world noise dataset showed that the CRIAS method outperformed the baseline systems for DCASE Task4 2018 and 2019,with an F1-score improvement of 2-18%.(3)Research on Sound Source Localization Methods.We proposed a data augmentation method based on feature spectra(LSA),which solves small-sample sound source localization,generates additional data to augment the original dataset for neural network training,and improves network fitting.We proposed a sound source localization method based on ResT-Net,which solves sound source localization under reverberation and directional interference.Our method first takes in two features,namely intensity difference or phase difference,and then uses two convolution layers to extract and integrate feature dimensions.Next,Res2Net was utilized to extract multi-scale features,followed by a Transformer module to extract temporal context relationships from the multi-scale features.Finally,the backpropagation algorithm optimizes the model to achieve sound source localization.In the TNSSE 2021 dataset with environmental noise,reverberation,and directional interference,the proposed method achieved localization errors of 20° and 21.3 ° for FOA and MIC format sound,respectively,and localization recall rates of 60.3%and 46.4%. |