Font Size: a A A

Robust Recognition Of Mismatched Acoustic Scene Based On Data Augmentation And Triplet Loss

Posted on:2020-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2370330590974436Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of intelligent hardware,the improvement of multimedia technology and the popularization of mobile devices,the application of machine hearing is becoming more and more extensive.In recent years,audio signal processing technology has been widely used in unmanned vehicle,robot,smart home and other fields.Acoustic scene classification plays an important role in automatic driving and assistant decision-making.Due to a large number of audio devices,complex types and inconsistent data size,the performance of the recognition system is degraded,so a robust acoustic scene classification method is urgently needed.In order to enhance the robustness of the acoustic scene classification method,this paper will deeply analyze the mismatch between the above-mentioned device and data scale,and propose effective solutions.In the problem of device mismatch,the theoretical analysis of the mismatch problem is firstly made from the perspective of the channel,and then the method based on triplet loss is proposed.The distance between the same class of samples between different devices is smaller than that between different classes of samples.Learning the semantic information in the audio,automatically compensating the channel to improve the system performance.Improving the triplet loss for the complex distribution of real audio scene data,and the triple loss based on local learning is proposed.This method relaxes the constraint on the triplet by calculating the weighted distance between the anchor and other samples.There are many possible manifold distributions for each audio scene data,which is more consistent with the case where real audio scene contains multiple audio events.Experiments show that the triple loss based on local learning is an effective method to solve the channel mismatch problem and is suitable for acoustic scene classification.Compared with the baseline,the performance of triple loss based on local learning is relatively improved by 23.8%.In the problem of data scale mismatch,firstly,the influence of data balance on system performance is analyzed theoretically,and then a data expansion method based on mixup and random replacement is proposed to solve the problem of data scale mismatch.The mixup method uses a convex combination of two samples to generate a new sample to augment the data set,and the random replacement method expan ds the data set by recombining the two sample sequence segments to generate a new sample.Experiments show that mixup-based data augmentation method and random replacement-based data augmentation method can effectively solve the problem of data scale mismatch.Compared with the baseline,the performance of these two methods is relatively improved by 45.9% and 45.0% respectively.In order to enhance the robustness of the system,this paper also proposes a random mean shift method and combines mixup method to solve the mismatch problem.The random mean shifting method randomly adds the difference between the channels during training,and enhances the ability of the system to extract semantic information and ignore irrelevant information such as channels,there by improving system performance.Experiments show that the method based on random mean shift is an effective method to improve system robustness.Compared with the baseline,the performance is relatively improved by 46.5%.The combination of this method and mixup can further improve the recognition performance.The performance is relatively improved 51.0% compared with the baseline.
Keywords/Search Tags:acoustic scene recognition, device mismatch, data scale mismatch, triplet loss, data augmentation, random mean shift
PDF Full Text Request
Related items