| Audio signal processing is increasingly important in the field on monitoring of domestic activities,monitoring system and so on.Most audio signal processing problems use deep learning methods currently.The convolutional neural network(CNN)is one of the most commonly used in deep learning neural networks.The CNN has the disadvantages of poor interpretability and lossing the target location information in the pooling layer.To avoid these shortcomings of the CNN,this paper proposes to use the convolutional sparse coding(CSC)model for audio signal processing problems.The CSC model focuses on constructing sparse and shift-invariant representations of signals,which is more interpretable and has fewer model parameters.Firstly,the multi-layer iterative soft threshold network(ML-ISTA-NET)for audio classification problem is proposed.In order to capture the temporal context information of audio events,the Bidirectional Gate Recurrent Unit(Bi-GRU)was added after the MLISTA-NET,and the ML-ISTA-GRU network was proposed.In order to focus on the important frames in audio events,the attention mechanism is further added after the MLISTA-GRU network and the MLISTA-GRU-Att network is proposed.The experiment results show that the audio classification performance of the ML-ISTA-NET network,the MLISTA-GRU network and the MLISTA-GRU-Att network are better than the baseline system.Secondly,in order to solve the problem of the weakly supervised learning of the sound event detection task,the MRNN-Att network based on Multi-layer Local Block Coordinate Descent(ML-LoBCoD-NET)is proposed;in order to make full use of the feature extraction of the CNN and the ML-LoBCoD-NET network,the MCRNN-Att network for sound event detection task is proposed.In order to solve the problem of semi-supervised learning of the sound event detection task,the mean teacher model based on the MRNN-Att network and the MCRNN-Att network are proposed.The experiment results show that the four proposed methods have better sound event detection performance than the baseline system.Finally,the CSCNet-LFISTA network is proposed for Log_Mel spectrogram denoising problem.The CSCNet-LFISTA network is a network unfolded based on the Learned Fast Iterative Soft Thresholding Algorithm(LFISTA).In order to improve the data fitting difference between training and test samples,the CSCNet-LFISTAm network based on the LFISTAm algorithm is proposed.The experiment results show that the denoising performance of the CSCNet-FISTA network and the CSCNet-FISTAm network is better than that of the BM3D model,and the convergence speed is faster than that of CSCNet network.The CSCNet-LFISTAm network achieved the fastest convergence speed. |