Font Size: a A A

Research On Scattering Transform Of Acoustic Scenes Classification Based On Self-Attention Mechanism

Posted on:2024-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:S SongFull Text:PDF
GTID:2530307163462884Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Acoustic scene classification is a hotspot in Context-Aware Service,Smart Wearable,Robot navigation,etc.,but it also faces some challenges: Firstly,There are complex environmental events and reverberation in the Acoustic scene,how to effectively deal with these factors are crucial;Secondly When the relative noise is unignored,the narrowband signal may be covered or submerged in the noise,how to prevent the loss of the key narrowband signal is important;Thirdly,Acoustic scene signals are often long-term,and there is a strong correlation among different time frames.How to capture the semantic dependence of long-term sequence need to be considered.In response to the challenges faced by acoustic scene classification,the research content of the paper is as follows:(1)For complex environmental events and reverberation,this paper adopts the second-order scattering transform to calculate the modulation spectral coefficients through a cascade of wavelet convolution and modulus operators.The signal energy of the interference factors will be dispersed to different frequency and time scales,so the interference components will be weakened,while the features related to the scene will be strengthened.On this basis,the introduction of constrained learnable parameters are adapted to deal with different acoustic scenes,thereby extracting features related.(2)Aiming at the ubiquitous noise problem in the acoustic scene,this paper proposes a combination of frequency weighting and channel attention weighting to depress the influence of noise and reverberation,and extract features related to the scene.Specifically,frequency weighting can weight the characteristics of different frequency bands according to the importance of frequency to improve the sensitivity to scene-related frequencies and reduce the response to noise and reverberation frequencies.Channel attention weighting can adaptively adjust the importance of different channels by learning the weight of each channel,improve the sensitivity to scene-related channels,and reduce the response to noise and reverberation channels.These methods all aim to minimize the impact of noise and reverberation on classification accuracy while preserving scene-related information.(3)To solve the problem that the feature model is difficult to capture the semantic dependence of long-term sequence signals,the model introduces Transformer’s self-attention to ensure consistent feature granularity.The self-attention mechanism can be used to capture semantic dependencies in long-term sequence signals and establish key dependencies between different time steps,thereby improving the accuracy of sound scene classification.In order to adapt to multiple scene categories,the model uses Focal Loss to solve the problem of sample imbalance in different acoustic scenes.This paper combines the above work to propose a composite model under the self-attention mechanism,and applies it to the Urban Sound8 K,Google Command,and ESC-50 datasets.The average accuracy rates obtained are 99.1%,96.7%,and 91.5%,respectively,and The effectiveness of the conforming model is verified by ablation experiments.
Keywords/Search Tags:Second-order scattering transformation, constrained learnable filter, self-attention mechanism, frequency band weighting, filter weighting
PDF Full Text Request
Related items