Font Size: a A A

Research On Sound Event Detection Based On Weakly Supervised Learning

Posted on:2021-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2568306104970719Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,weakly supervised learning based on weakly labeled audio data has become a hot research issue in sound event detection.In this thesis,due to the problem of weakly supervised learning in sound event detection、the limitation of local sensing field、insufficient labeled data and overlapping of audio events in sound event detection,the deep neural network is improved to improve the performance of sound event detection.Firstly,in order to separate sound events from background scenes or noise,Res2Net Expected Maximum Attention Network(Res2EMANet)model based on time-frequency segmentation network model is proposed for weakly supervised sound event detection..In view of the problem that the general convolutional neural network is limited by the local sensing field and cannot fully capture the long-distance information,this thesis proposes that the combination of the Res2Net network and the expected maximum attention mechanism can effectively increase the sensing field range.Experimental results show that the performance of the proposed model for sound event detection is better than that of the baseline system.Secondly,an improved mean teacher model is proposed for semi-supervised sound event detection in order to improve performance with a large amount of unlabeled data.The improvement of training strategy is that the Stochastic Weighted Average algorithm is applied to sound event detection for the first time,which can speed up the prediction and save the cost.The improvement of the model architecture lies in the use of global weighted rank pooling layer,which can solve the limitation of traditional pooling on the underestimation and overestimation of sound events.Moreover,SpecAugment data enhancement method is adopted to effectively solve the problem of overfitting.Experimental results show that the performance of the proposed model for sound event detection is better than that of the baseline mean teacher system.Finally,aiming at the overlap of sound events in real audio clips,SECapsule Recurrent Attention Neural Network(SECapsRANN)model was proposed for polyphonic event detection.The proposed model combines the advantages of SENet and CapsNet to separate each individual sound event from the mixed overlapping features.The attention mechanism is introduced to make the network pay more attention to significant events.Experiments show that the proposed model can effectively solve the problem of sound event overlap and improve the performance of sound event detection.
Keywords/Search Tags:sound event detection, weakly supervised earning, attention mechanism, mean teacher model, capsule network
PDF Full Text Request
Related items