Font Size: a A A

Research On Unsupervised Anomaly Detection For High-dimensional Data Based On Autoencoder Ensembles

Posted on:2022-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:L BaiFull Text:PDF
GTID:2517306491977259Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the emergence of deep learning and neural network,unsupervised anomaly detection of high-dimensional data has achieved good research results in the field of machine learning,and classic outlier detection algorithms has broad application prospects in industrial applications.However,when traditional algorithms are applied to high-dimensional data,they will face the challenge of reducing scalability and triggering dimensional disasters.The classical two-step algorithm of reconstruction error and density estimation still has room for improvement.The independent training process lead to the low dimensional data representation can not contain enough important information of high-dimensional data.At the same time,the separation of the dimensionality reduction network and the density estimation training process can easily lead to the model fall into a local optimal situation.In order to further improve the accuracy and effectiveness of the model,it is necessary to increase the feature information and modify the training strategy.We present a deep hybrid autoencoder model in this paper.First of all,the combined density estimation process of the compression network constructed by the mixed k-stacked dnoise autoencoders improves the DAGMM algorithm,and it retains sufficient original data distribution information.A experiment was conducted on the public KDDcup99 set and Talking Data fraud detection data set.The experimental results show that the SDAGMM algorithm improves the model's evaluation index AUC and F1-score by 2%.Secondly,a model framework including hybrid compression network,mapping network,hybrid reconstruction network and estimation network is reconstructed.At the same time,the end-to-end training strategy is used in the pretraining process,the sample entropy,batch entropy and sample energy are added to the loss function,and the maximum entropy discrete theorem is used to ensure the reasonable allocation of samples.We improved the problem of local optimality.The experimental results on the same public data set show that the M-SDAGMM algorithm increases the model's evaluation index F1-score by 5%,and the AUC fluctuates more stably when the abnormal proportion changes.
Keywords/Search Tags:Unsupervised Anomaly Detection, Gaussian Mixture Model, Stacked Denoise AutoEncoder, DAGMM
PDF Full Text Request
Related items