| Anomaly detection is an important research problem in the field of data mining,which can be used in network intrusion detection,financial fraud detection,medical diagnosis and other aspects,and has high research value and broad application space.With the development of information technology,people get more and more data,most of which are unlabeled.In practice,anomaly detection tasks can only use unsupervised methods in most cases.With the development of information technology,people get more and more data,most of which are unlabeled.In practice,anomaly detection tasks can only use unsupervised methods in most cases.However,in the face of increasingly diverse data types and patterns as well as the unknowability and complexity of abnormal manifestations,unsupervised anomaly detection algorithms are increasingly inadequate.Ensemble learning can effectively improve detection performance by combining learners.However,the existing ensemble methods often ignore the data locality,and most of them focus on global anomalies from a global perspective.when the distribution of data in a local area is quite different from the overall data,some data tends to be normal from a global perspective,but it is actually a local anomaly.At the same time,the detection performance of ensemble learning is greatly affected by the quality of individual learners.Bad individual learners will degrade the final detection performance,but it is difficult to ensure the quality of individual learners without data labels.To solve the above problems,the main research contents of this paper are as follows:(1)In order to make good use of the locality of data,this paper proposes an ensemble learning combination method——LSCP-cluster,which uses DBSCAN algorithm to determine the local area of data.LSCP-cluster method uses the ability of clustering method to gather similar objects to determine the local area of data.After that,pseudo-labels are generated according to the local area where the data is located,LSCP-cluster selects individual learners suitable for ensemble based on the local area where the data is located.Through a selective combination process,the final detection performance and generalization ability of the method are improved.(2)In order to avoid the use of bad individual learners,MetaOD,an unsupervised anomaly detection model selection method,is adopted.MetaOD can use the statistical features and landmark features of dataset to predict the performance of learners.In this paper,a model pool with a wide range of hyperparameter combinations is constructed,and bad learners are eliminated to ensure the quality of the final integrated learner.(3)An anomaly detection system based on unsupervised ensemble learning was constructed with LSCP cluster and MetaOD as the core.The experimental results on multiple datasets show that the proposed method effectively improves the ensemble effect,and has significant AUC performance advantages compared to the original LSCP method and existing commonly used combination methods for ensemble learning,achieving the best performance on most datasets(12/14). |