Research And Application Of Few Shot Learning In Audio Event Classification

Posted on:2024-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:L D Tian

Full Text:PDF

GTID:2568307079971009

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

The dependency of deep learning on massive training data is very strong because it requires a large amount of data to understand the underlying patterns.However,data collection is complex and expensive,making it extremely challenging to build largescale,high-quality annotated datasets.Audio event data collection is a complex and challenging task.On the one hand,audio signals are heavily affected by interference from the environment,devices,and other factors,which affects the quality and accuracy of audio data.On the other hand,the annotation cost for audio event data is high.Therefore,audio event classification tasks are usually performed with only a small amount of training data,and addressing the problem of insufficient audio training data is crucial for audio event classification tasks.Few-shot learning is the study of how to build models with generalization capabilities by learning from a small number of samples.Thesis proposes a few-shot learning approach based on transfer learning and data augmentation to address the problem of insufficient training data in audio event classification tasks.Ultimately,thesis designs multiple transfer learning schemes for few-shot audio event classification based on different base network models.Starting from the two mainstream deep learning modules,CNN and Transformer,thesis improves and implements three transfer learning models for audio event classification: a CNN model based on standard convolutional operations,a DSCNN model based on depthwise separable convolutional operations,and a Transformer model based on self-attention mechanisms.On the other hand,thesis compares the effects of data augmentation methods based on data mixing,including Mixup,Cutmix,and Spec Augment,on the generalization capability of the models in audio event classification tasks.Based on the characteristics and advantages of these data augmentation methods,a mixed time-frequency masking virtual sample generation method suitable for spectrograms is designed.In order to make better use of information in audio data and consider both the temporal characteristics of audio signals and the input characteristics of transfer learning models,the first-order difference and secondorder difference information of spectrograms are added to the input data.Finally,a multi-scale data augmentation method using mixed time-frequency masking suitable for small-sample audio data is innovatively proposed.The final experiments demonstrate that,the proposed multi-scale data augmentation method using mixed time-frequency masking achieves an improvement of 3 to 4 percentage points in accuracy compared to the baseline approach.After incorporating the proposed multi-scale data augmentation method using mixed timefrequency masking,the transfer learning scheme based on the DSCNN model achieves an accuracy of 94.6% on the small-sample audio event classification ESC-50 dataset.The transfer learning scheme based on the Transformer model achieves an accuracy of99.3% on the ESC-50 dataset,which is currently the best performance on the ESC-50 dataset.

Keywords/Search Tags:

Deep learning, Few shot learning, Audio data, Data augmentation, Transfer learning, CNN, Transformer

PDF Full Text Request

Related items

1	Research On Few-shot Learning Based Via Data Augmentation
2	Deep Learning Few-shot Electromagnetic Signal Classification
3	Deep Learning-based Few-shot Learning In Computer Vision
4	Research On Few-Shot Learning Method Based On Meta-Learning
5	Research On Few-shot Learning Methods Based On Data Augmentation And Model Fine-tuning
6	Research Of Key Technologies In Wireless Indoor Localization Based On Deep Learning
7	Person Re-identification Based On Deep Learning And Its Application In Airport
8	Zero-Shot Transfer Learning Based On Deep Models
9	Research And Implementation Of Few-shot Recognition Algorithm Based On Metric Learning And Data Augmentation
10	Research On Neural Network Structure Of Few-shot Image Classification Based On Metric Learning