Font Size: a A A

Research On Acoustic Scene Classification Methods Based On Convolutional Neural Network

Posted on:2022-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:G Q FengFull Text:PDF
GTID:2558307154476714Subject:Engineering
Abstract/Summary:PDF Full Text Request
Sound is a kind of important information medium for human beings to perceive the world.There are many kinds of sound,among which environmental sound is often ignored but very important.Environmental sound can be used to perceive and understand the environment,which provides an important basis for further action and analysis.Acoustic scene classification(ASC)technology analyzes the category of environment by studying the information contained in the environmental sound.ASC is the process of recognizing and classifying environmental sound,which involves many key technologies such as sound signal processing,data mining and decision making,machine learning,etc.ASC plays an essential role in the field of intelligent information processing.ASC has broad application prospects in intelligent monitoring,smart terminal and intelligent information retrieval.ASC has developed rapidly in recent years,especially with the combination of Convolutional Neural Network(CNN),its performance has improved by leaps and bounds.Unfortunately,ASC still faces two significant problems.One is that the accuracy is not high enough,and the other is that the algorithm is too complex.This thesis has done the following work to solve these two problems:(1)This thesis presents an ASC method based on Mel spectrogram decomposition and model merging.Efficient utilization of features is the key to improve classification accuracy.To study the feature extraction of Mel spectrogram by CNN,a CNN visualization method is used to analyze the feature activation of Mel spectrogram.Then,according to the distribution of feature activation,a decomposition method of Mel spectrogram is proposed,which decomposes the Mel spectrogram in time and frequency domains.A complete Mel spectrogram will be decomposed into several subspectrograms.Different CNNs are used to learn the sub-spectrograms in different frequency bands,so as to extract the feature information in each sub-spectrogram fully.In order to further utilize the global information of the Mel spectrogram,the outputs of each sub-model are merged using the model merging method to output the classification results more accurately.(2)This thesis proposed an ASC method based on lightweight CNN.Pointwise convolution is widely used in CNNs.Taking point convolution as the optimization object,this thesis proposes a local channel transformation method to replace the point convolution used for channel transformation.The local channel method includes depthwise channel ascent and group channel descent,corresponding to channel and channel dimensionality reduction.The depthwise channel ascent uses depthwise convolution to replace some pointwise convolution,so as to use spatial features to replace some inter-channel features to realize channel dimension upgrading.Group channel compression uses non-learning channel pooling to replace partial pointwise convolution,so as to reduce redundant operations in the process of channel compression.The local channel transformation method is applied to CNN to reduce the complexity of CNN model.In this thesis,two methods are proposed to solve the problems of low accuracy and high model complexity in ASC.The proposed methods are evaluated on DCASE2019 and DCASE2020 datasets.The experimental results show the effectiveness of the proposed methods.
Keywords/Search Tags:Acoustic Scene Classification, Convolution Neural Network, Mel spectrogram, Spectrogram Decomposition, Model Merging, Model Lighting, Pointwise Convolution
PDF Full Text Request
Related items