Font Size: a A A

Research On Unsupervised Outlier Detection Algorithm Based On Ensemble Learning

Posted on:2024-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z C JiangFull Text:PDF
GTID:2568307106999489Subject:Computer Science and Technology
Abstract/Summary:
Outlier detection is an important research topic in the field of machine learning,aimed at identifying and discovering outlier samples in a dataset that significantly differ from majority of samples.It is widely used in various fields such as finance,manufacturing,and healthcare,for detecting fraud,identifying equipment failures,and monitoring patient health,among others.However,most existing outlier detection algorithms are designed and implemented for specific outlier classes in certain domains,which often cannot effectively detect multiple types of outliers in different fields,leading to limited generalization capability of the algorithm.Furthermore,outlier detection is usually unsupervised,as the cost of labeling abnormal data is prohibitively high due to the highly imbalanced ratio between normal and abnormal samples.To address these challenges,the design of unsupervised outlier detection algorithms with high generalization capability to deal with diversified outlier types in different domains is currently a hot and challenging research topic.Ensemble-based outlier detection algorithms,referred to as "outlier ensembles," can improve the model’s robustness and generalization capability by combining multiple single outlier detection algorithms,reducing the model’s dependence on specific datasets or data locality.This approach has high scalability and flexibility and can achieve better outlier detection results by selecting appropriate base detectors and ensemble strategies according to different application scenarios and data characteristics.Although outlier ensembles have achieved some success in practice,there are still challenges and unresolved issues in selecting appropriate ensemble strategies and base detectors.This thesis focuses on how to use ensemble learning to improve the accuracy and generalization capability of outlier detection and mainly includes the following two aspects:(1)In response to the problem of the lack of selection in the combination of base detectors in existing unsupervised outlier ensembles,a multi-stage outlier ensemble algorithm is proposed.This algorithm combines the results of multiple unsupervised outlier detection algorithms through a ranking aggregation method to generate pseudolabels,which guide the model to adaptively select the best classifier as the base learner in the stacking stage,improving the accuracy and generalization capability of the model for outlier detection.Firstly,multiple unsupervised outlier detection algorithms are used to construct a base detector pool,which can be used to extract useful representations from the training set and better capture different types of outlier patterns to obtain better outlier detection performance.Then,the new data representations are integrated using the ranking aggregation method to generate pseudo-labels to guide the model’s training in the next stage.Additionally,to further optimize the proposed model,a stack-based dynamic classifier selection ensemble model is proposed.This model can adaptively select the best-performing classifier as the base learner in the stacking stage on different datasets,thereby improving the accuracy and generalization capability of the model for outlier detection.A series of experiments on real-world outlier detection datasets show that the proposed method can more effectively detect outliers compared to baseline algorithms.(2)In response to the problem that existing unsupervised deep outlier detection methods are sensitive to the presence of outliers in the training set and overly dependent on specific network structures,an improved autoencoder-based outlier ensemble algorithm is proposed.This algorithm combines the advantages of deep learning and ensemble learning in a framework,achieving robustness and generalization capability improvements for high-dimensional data outlier detection.Firstly,the training data is preprocessed to remove any possible outliers,allowing the model to better fit the distribution of normal samples.This makes the reconstruction error of the autoencoder more discriminative,improving the accuracy and robustness of the model.Next,a set of attention-based random autoencoders is generated,with each autoencoder having a different network structure.This allows for overfitting to occur within a single network,but can reduce the variance of the overall reconstruction error through the combination of multiple networks.At the same time,an attention mechanism is introduced in each autoencoder to focus on the important features of the input data,improving the reconstruction accuracy and ultimately the prediction accuracy of the model.Finally,the method combines the reconstruction errors generated by all autoencoders as the final outlier score,and automatically sets the outlier threshold score based on the Cantelli inequality to output the final outlier detection results.Experimental results on a series of real-world datasets show that the proposed algorithm performs better in high-dimensional data outlier detection compared to baseline algorithms.In conclusion,the aim of this thesis is to introduce the idea of ensemble learning into the field of outlier detection,in order to improve the accuracy and generalization ability of models.This thesis designs two different outlier ensemble algorithms to address diverse outlier types in different fields and performs well in experiments.These research results are expected to improve the reliability and effectiveness of outlier detection algorithms in practical applications,and further promote the development and application of outlier detection technology.Overall,the research methods and results of this thesis provide valuable references for future research in the field of outlier detection.
Keywords/Search Tags:Outlier detection, Ensemble learning, Unsupervised learning, Deep learning, Generalization ability
Related items