Font Size: a A A

Research On Sound Scene Classification Based On Deep Learning

Posted on:2020-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:M LiFull Text:PDF
GTID:2428330620956206Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The intent of sound scene classification technology is to determine the scene where the audio is from by processing the audio signals.The technology plays an important role in the fields of smart phones,audio content retrieval,robot intelligent perception and unmanned driving.Recently,the international competitions in relevant areas and the rapid development of deep learning algorithm has further promoted the development of the sound scene classification technology.Motivated by the deficiencies in the existing literature and the demand of the engineering,several improved sound scene classification algorithms are proposed to extract information and learn features from samples of sound scene in this paper by using the model fusion based on the convolution neural network and transfer learning technology,which successfully improves the accuracy of recognition of sound scene classification system.The main work and innovation points of this paper are as follows:(1)The research background and significance of sound scene classification task are expounded.Several existing authoritative databases and relevant competitions are introduced.The relevant research history and recent competition results are summarized from the aspects of sound scene features and the sound scene recognition algorithm.(2)The basic framework of sound scene classification system is proposed,where the system is divided into five subsystems including data partitioning,preprocessing,extraction of key features,classification and testing.Furthermore,the main function and key algorithms of each subsystem such as frame,window and pre-emphasis in the preprocessing module,short-term energy,short-term zero crossing rate,mel-frequency spectrum coefficient in the extraction of key features module and the common machine learning and deep learning algorithms are studied in detail.Besides,the effects of above algorithms are also compared by experiments,which lay a theoretical foundation for the following chapters.(3)The model fusion algorithm is applied to the sound scene classification,and an improved two-channel convolutional neural network model is designed as the weak classifier for model fusion.More specifically,independent and unrelated convolutional layers are used to process different channel features before the full connection layer,and the obtained features are stacked and input to the full connection layer for subsequent processing.Then six different characteristics are constructed and six different models are trained by choosing the characteristics and differences of the mel-frequency spectrum as the input signal and using two audio channel separation methods and three audio channel cutting methods.The final models are obtained from the model fusion algorithms including the voting method and stacking fusion method with support vector machine as strong classifier.Finally,the effectiveness and advantages of the proposed method is demonstrated via simulation results based on theTUT Urban Acoustic Scenes 2018 dataset.(4)Aiming at the problem of insufficient acoustic scene data,parameter transfer learning and feature-representation transfer learning are proposed.Parameter transfer learning migrates knowledge from the source data set Audioset to the target data set TUT Urban Acoustic Scenes 2018 by migrating parameters from the convolution layer in the VGGish model.Featurerepresentation transfer constructs feature migration mapping space using deep sparse selfencoder,completes the transfer between TUT Urban Acoustic Scenes 2018 data set and TUT Urban Acoustic Scenes 2018 Mobile data set.Experiments verify the effectiveness of parameter transfer learning and feature-representation transfer learning.
Keywords/Search Tags:Sound scene classification, Two-channel convolutional neural network, Model fusion, Migration learning
PDF Full Text Request
Related items