Research On Audio Event Recognition Based On Deep Learning

Posted on:2020-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:H W Wu

Full Text:PDF

GTID:2428330575456409

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

As the basic core task in the field of audio research,audio event recognition is the key task of current popular audio research directions,such as audio scene analysis,audio event detection,audio automatic labeling and so on.For the construction of audio event recognition system,there are some essential difficulties to be solved,namely the randomness of audio distribution and the diversity of audio event itself.The theoretical goal of the research is to analyze these fundamental problems and try to put forward some ideas to solve these problems,while the practical goal of the research is to put forward a unique system structure suitable for the task of audio event recognition.In this paper,based on the comprehensive research results of predecessors as well as the knowledge of machine learning,deep learning,speech recognition,ear perception and other fields,a hierarchical attribute theory algorithm framework is proposed from the essence of audio,and the research is mainly carried out from two aspects.1.Studied audio event recognition based on convolutional neural networkThe main purpose for research on CNNs is to solve the diversity and randomness of audio events.Referring to the thought of universal and categories dependence,this paper discussed the input and the network modules in detail,and some of more in-depth study has guiding significance to the conclusion:On the one hand,the input side,we tried a variety of sound spectrum,the original audio,such as excitation source and the vocal tract spectrum,by demonstrating experiment it is concluded that the best input is Mel spectrum;On the other hand,in terms of network,this paper focuses on the study of audio perception field,starting from the convolution perception field and residual network perception field,and also considers the introduction of hierarchical information network and timing processing,etc.In the end,good results are achieved in several data sets.The accuracy of ESC10 is as high as 90.8%,UrbanSound8k as high as 72.3%,which is higher than the official optimal results.And a set of perception field design scheme most suitable for audio is obtained.2.Studied multi-level information fusion perspectivesDomestic and foreign research on fusion is also very hot,the main reason is that fusion can introduce more information to solve diversity.Therefore,this paper further studies the processing of audio diversity by combining feature engineering and deep learning,it uses multiple information to fuse from different levels,that is,to solve the diversity from the perspective of category dependence.Based on research analysis and experience,this paper extracted the combination of short-term features,added new rhythm features and selected features.Based on the deep learning network structure,this paper embedded feature engineering at different levels,made attempts at feature layer,network layer and model layer,and achieved good effect improvement in the network layer.The accuracy of ESC 10 is as high as 91.4%,UrbanSound8k as high as 73.1%.Experimental results show that,based on the idea of universality and category dependence,the proposed algorithms solve the diversity and randomness of audio events to some extent through the design of convolution field,deep field and multi-level information fusion,and achieves good results in multiple data sets.

Keywords/Search Tags:

audio event recognition, mel spectrum, deep cnns, feature engineering, multi-information fusion

PDF Full Text Request

Related items

1	Multi-speaker Recognition Based On Audio-video Feature Fusion In Smart Environment
2	Research On Algorithm Of Audio-visual Event Recognition And Sound Source Localization Based On Audio-visual Fusion
3	Research On Audio-Visual Event Localization And Recognition Based On Cross-Modal Learning
4	Study On Event Image Classification By Fusing Multiple CNNs Based On LSTM
5	Research On Deep Feature-Level Fusion Of Face-Audio Multimodal Personal Identification
6	Research On Emotion Recognition Based On Multi-feature Fusion Of Video And Audio
7	The Research On Channel Adaptive Method Of Audio Event Recognition
8	Multi-speaker Recognition Based On Audio Video Information Fusion In Meeting Room Environment
9	Research On Digital Modulation Mode Recognition Based On Deep Learning
10	Research On Multi–modal Emotional Recognition Based On Audio And Visual