| Audio Event Classification(AEC)is one of the most attractive researches in the field of voice signals,with a wide range of application scenarios,involving application technologies such as personalized recommendation,audio and video monitoring,simultaneous translation and so on.The database used to train the deep learning model of AEC usually presents a Long-tailed Distribution,that is,there are many data samples of the head category,and the corresponding tail category sample data is less,which leads to the unsatisfactory classification effect.Especially,the tail category classification results will be biased towards the head category with sufficient data.But the tail category also has information that cannot be ignored,and the information is even more important.In order to deal with the negative impact of the long-tail distribution on the training of the AEC model,I do corresponding research in this thesis.The main work and innovations are as follows:(1)Propose a two-stage training method based on Transfer Learning(TL).The difficulty in processing long-tail distribution data is firstly the imbalance of the data,and secondly the lack of feature representation ability of the tail category due to the relatively scarce amount of data.Transfer learning takes the data set with sufficient data as the source domain and the data set to be processed as the target domain.Through feature migration,model migration,etc.,the source domain information is transferred to the target domain,thereby achieving the goal of improving the performance ability of the target domain model.This article uses a two-stage training method.First,the model is trained on a balanced data set,and the obtained model parameters are fixed for the second stage of training.In the second stage,the training set presents long-tail features,and the data features of the source domain are fully utilized to improve the overall classification effect.(2)Propose a processing method for the equalization loss function of speech classification based on long tail distribution.Usually used to deal with the loss function of a balanced distribution data set,such as Soft Max,Cross Entropy,directly used in a long-tail distribution data set will cause the tail category to be ignored relative to the head category with sufficient data volume and be treated as noise,resulting in overall recognition consequences of low accuracy.This thesis studies the gradient effects of different categories in the back propagation process and proposes a balanced loss function that can improve the recognition accuracy of tail categories without affecting the accuracy of head category classification.To sum up,this thesis proposes a model training method based on transfer learning and a processing method based on the long-tail distribution of speech classification equalization loss function.The experimental analysis is carried out from many aspects on Audio set with additional experiment on other three data sets ESC-10,UrbanSound8 and GTZAN,and the reliability and effectiveness of the method proposed in this thesis are verified under a variety of experimental conditions. |