| A piece of audio may contain not only bad information with text features,but also bad information without text features,such as gunshots and groans.Therefore,in order to detect bad audio events and their occurrence time,Existing audio bad information detection algorithms generally only classify and detect audio textual content or frequency spectrum,cannot locate the start and end times of adverse events,and the algorithm has a low accuracy rate.With the increasing number of platforms for live broadcasting,short video and content transmission in the form of audio,the content audit interface provided by Baidu and Netease cannot achieve the ideal detection effect when detecting the audio in a longer period of time and the audio without content.When applied in actual scenarios,there are the following problems: the existing detection algorithms usually use classification ideas,which makes it difficult to quickly review the audio twice when the judgment is disputed;For audio without textual content,the classification algorithm cannot be judged normally,and there is a certain error propagation in the audio textualization and classification detection process;It is difficult to intuitively compare the prediction effect and other problems.These problems bring great challenges to the detection of bad audio information.It becomes more important and difficult to accurately locate the bad information event and its time from the mass audio information.Based on the study of audio classification and sound event detection methods,This paper mainly focuses on the detection method of audio bad information.Taking the pornographic audio on the Internet as the sample,adopting the multi-level idea,fully integrating the characteristics of all levels of audio,and using the Mean-Teacher network combining the audio classification technology to detect the audio Event,Bad Audio Event Detection Based on Deep Learning(BAED)is put forward,which is suitable for the content reviewers of the platform in the form of audio transmission.The BAED technology proposed in this paper includes content-based bad audio classification algorithm and multi-level Mean-Teacher joint detection algorithm.BAED takes the frequency spectrum and text after audio conversion as the characteristic input.On the basis of retaining the Mean-Teacher attention layer and frame classification,The algorithm combines audio content classification and frequency spectrum based audio classification technology,and considers the audio context relationship on the basis of ensuring the detailed detection of the model,so as to improve the model accuracy on the premise of ensuring the interpretability of predicted labels and the location of event time.The experiment compared the Mean-Teacher,based on the content of audio classification,Guided learning and BAED on real pornographic audio data set of comprehensive performance.The results show that the BAED detection technology proposed in this paper can effectively detect pornographic audio events,and its comprehensive performance has been significantly improved. |