Font Size: a A A

Technical Research And System Implementation On Violence Audio Scene Classification

Posted on:2017-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:J J FengFull Text:PDF
GTID:2308330503487187Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of internet and the film industry, multimedia files like audios and videos increase sharply. However, some audio and video files often contain a lot of violent elements. For the processing speed of audios is much faster than the videos, audio-based violence scene recognition technology has been getting more and more attention. Current violence audio detection technologies that mostly base on traditional machine learning algorithms has made a breakthrough when compared to the traditional manual review mode. But there still exist the following problems: First, The generanization ability of the system is weak, that is, different scenarios typically require to select different audio feature; Second, the recognition performance of the system needs to be improved, mainly because the traditional machine learning algorithms are based on shallow learning, so the modeling capability is weak when facing some complex feature like audios; The last one, most of the violent audio recognition methods get poor recognition performance when they are in the real noisy scene. Aiming at the above problems, this paper has mainly done the following researches:(1) For the problem of poor generalization ability under different scenarios, we has applied the Deep Neural network(D NN) to violence audio scene recognition task. As the deep learning model, the DNN has better performance in feature learning and feature expression when compared with the traditional shallow learning algorithms. In most scenarios, we don’t need to select features manually for that it allows us to directly use low-level features such as logarithmic power spectrum, spectrogram, etc., as DNN input.(2) For the problem of low recognition performance of system, on the one hand, we can put new features which learned by DNN with other features such as MFCC, zero crossing rate and energy entropy to classifier; on the other hand, we use the discretization method and the feature selection to imporve the expression ability of the feature. Meanwhile, during the recognit ion phase of violence audio recognition task, the K-Nearest Neighbo(KNN) method is applied to correct the classification results and then improve the system’s recognition performance.(3) For the problem of low recognition rate in noisy background, we use the Deep Denoising Autoencoder(DDAE) to reduce noises, which can reduce the difference between the training data and the test data, and thus improve the robustness of audio features.(4) To imporve the training speed and the performance of the network, we proposed the self-increment restricted boltzmann machine(Incre-RBM) based on the restricted boltzmann machine(RBM). Experiments show that the Incre-RBM gains faster training speed and better classification permance.
Keywords/Search Tags:violence audio recognition, deep learning, RBM, feature extract
PDF Full Text Request
Related items