Audio Classification And Sound Event Detection Based On Convolutional Sparse Coding Model

Posted on:2021-03-01

Degree:Master

Type:Thesis

Country:China

Candidate:J Xia

Full Text:PDF

GTID:2568306104470714

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Audio signal processing is increasingly important in the field on monitoring of domestic activities,monitoring system and so on.Most audio signal processing problems use deep learning methods currently.The convolutional neural network(CNN)is one of the most commonly used in deep learning neural networks.The CNN has the disadvantages of poor interpretability and lossing the target location information in the pooling layer.To avoid these shortcomings of the CNN,this paper proposes to use the convolutional sparse coding(CSC)model for audio signal processing problems.The CSC model focuses on constructing sparse and shift-invariant representations of signals,which is more interpretable and has fewer model parameters.Firstly,the multi-layer iterative soft threshold network(ML-ISTA-NET)for audio classification problem is proposed.In order to capture the temporal context information of audio events,the Bidirectional Gate Recurrent Unit(Bi-GRU)was added after the MLISTA-NET,and the ML-ISTA-GRU network was proposed.In order to focus on the important frames in audio events,the attention mechanism is further added after the MLISTA-GRU network and the MLISTA-GRU-Att network is proposed.The experiment results show that the audio classification performance of the ML-ISTA-NET network,the MLISTA-GRU network and the MLISTA-GRU-Att network are better than the baseline system.Secondly,in order to solve the problem of the weakly supervised learning of the sound event detection task,the MRNN-Att network based on Multi-layer Local Block Coordinate Descent(ML-LoBCoD-NET)is proposed;in order to make full use of the feature extraction of the CNN and the ML-LoBCoD-NET network,the MCRNN-Att network for sound event detection task is proposed.In order to solve the problem of semi-supervised learning of the sound event detection task,the mean teacher model based on the MRNN-Att network and the MCRNN-Att network are proposed.The experiment results show that the four proposed methods have better sound event detection performance than the baseline system.Finally,the CSCNet-LFISTA network is proposed for Log＿Mel spectrogram denoising problem.The CSCNet-LFISTA network is a network unfolded based on the Learned Fast Iterative Soft Thresholding Algorithm(LFISTA).In order to improve the data fitting difference between training and test samples,the CSCNet-LFISTAm network based on the LFISTAm algorithm is proposed.The experiment results show that the denoising performance of the CSCNet-FISTA network and the CSCNet-FISTAm network is better than that of the BM3D model,and the convergence speed is faster than that of CSCNet network.The CSCNet-LFISTAm network achieved the fastest convergence speed.

Keywords/Search Tags:

audio classification, sound event detection, sound denoising, convolutional sparse coding, weakly supervised learning

PDF Full Text Request

Related items

1	The Research Of Sound Event Classification And Detection On Semi-supervised Learning Method
2	Research On Sound Event Detection Based On Weakly Supervised Learning
3	Weakly Supervised Learning For Audio Analysis
4	Research On Sound Event Detection Technology In Domestic Environment
5	Weakly Supervised Sound Event Recognition On Noisy Label Dataset
6	Research On Sound Event Classification And Detection Method Based On Semi-supervised Learning
7	Semi-supervised Sound Event Detection Based On Deep Neural Network
8	Research And Implementation Of Deep Learning Based Sound Event Detection
9	Research On The Classification Of Indoor Multi-channel Human Activities Sound Events
10	Neighbourhood Similarity Augmentation On Multi-source Sound Event Detection And Localization