Sound Classification Based On Convolutional Neural Network

Posted on:2019-07-24

Degree:Master

Type:Thesis

Country:China

Candidate:B Q Zhu

Full Text:PDF

GTID:2428330611993290

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Sound classification is the basic research work in multimedia information process-ing.It is the core technology of sound data structuring.It is significant in the fields of signal processing and speech recognition.Many fields have urgent needs for the high-performance sound classification systems.In recent years,with the development of deep learning technology,the combination of deep neural network and audio data processing analysis has become a new research hotspot.Convolutional neural network which is one of the most representative in the deep learning has achieved remarkable results in sound classification tasks.This paper focus on the sound classification method based on the convolutional neu-ral network models.Firstly,this paper proposes a multi-scale time domain convolutional network(WaveM-sNet)with feature fusion mechanism for the problem that it is difficult to extract strong discriminant features from audio data.In the study,we analyzed the dilemma of the con-volutional neural network in the extraction of waveform signals,that is,the convolution filters cannot be distributed across the full frequency band while improving the frequency resolution.Under this problem,the features we extracted through the network cannot represent the audio information effectively.To this end,we propose a multi-scale time domain convolution operation to increase the discrimination of features.At the same time,we also propose a feature fusion method,which combines the waveform features extracted by the network and the two-dimensional time-frequency features in the same network.On the sound classification datasets ESC-10 and ESC-50,multi-scale time do-main convolution operations can improve the classification accuracy by 1.95%and 2.82%on average.After using the feature fusion method,the classification performance of our system exceeds the previous related work.Secondly,in order to solve the problem of poor generalization ability of the acoustic classification model under the insufficient labeled data,we propose a hybrid sample learn-ing method for audio data.In the training of neural networks,in order to reduce the per-formance difference between the training set and the test set,data enhancement is widely used.It makes a variety of data while keeping the data semantic information unchanged.Although,the deformation enriches feature patterns and improves the generalization per-formance of the network,it treats each sample independently and does not consider the changes between samples.The relationship between different samples is ignored as a re-sult.In this paper,we consider whether it is possible to construct a pattern of features from a sample pair(two samples),to learn the relationship and differences between pairs of same or different classes of samples.We propose a hybrid-sample-based learning al-gorithm for convolutional neural networks,which can be applied to various convolutional neural network architectures.In order to explore the better sample hybrid method,we pro-pose a variety of sample hybrid methods for two audio features,time-frequency features and waveform features.The performances of these methods are verified by comparison in different network architectures.On the DCASE2018Task2 dataset,our proposed Overlay method get a maximum performance increase of 3.68%and 3.27%for time-frequency and waveform features.

Keywords/Search Tags:

CNN, Sound Classification, Multi-Scale, Feature Fusion, Hybrid Sample

PDF Full Text Request

Related items

1	The Research Of Audio Classification Based On The Characteristics Of Source-Sound
2	Rare Sound Detection Based On Multi-scale Neural Network
3	Research On Image Classification Algorithm Based On Contextual Discriminative Feature Fusion
4	Research On Recognition Of Sound Events Based On Multi-scale And Multi-level Feature Analysis
5	Research On Image Classification Model Based On MIP-Mixer And Multi-Scale Feature Fusion
6	Research On Real Noise Image Denoising Algorithm Based On Multi-scale Feature
7	PolSAR Classification On A Small Scale Sample
8	Video Action Recognition Based On Hybrid Attention Mechanism And Multi-scale Feature Fusion
9	Research On Sound Event Detection Based On Deep Learning
10	Research On Sound Event Detection And Location Based On Improved CRNN Model