Outlier Detection Based On Feature Grouping And Data Self-representation And Its Application

Posted on:2024-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Gao

Full Text:PDF

GTID:2568307094984249

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Outlier detection in high-dimensional data faces the challenge of the curse of dimensionality,where the number of features unrelated to outlier detection increases,leading to increased computational complexity and negatively affecting detection results.Data self-representation methods can be used for outlier detection,amplifying the differences and correlations among the data.However,existing techniques fail to account for the influence of inter-feature correlations on outlier detection,thus rendering them unsuitable for high-dimensional data.To address the above issues,this paper delves into exploring inter-feature correlations and conducts in-depth research on outlier detection based on feature grouping and data self-representation,proposing an outlier detection algorithm suitable for high-dimensional data.The main research contributions of this paper are as follows:(1)In this study,we propose a feature extraction and grouping algorithm called the Balanced Association-based Feature Grouping(BAFG)algorithm.Firstly,we balance the consideration of both data proximity and the probabilistic relationships among features in feature extraction and grouping,to extract a subset of strongly associated features.Secondly,we define the basis for partitioning the final feature groups by measuring the redundancy among features,which not only reduces the impact of high-dimensional data on the grouping results but also obtains more effective feature partitions.This algorithm comprehensively and thoroughly reveals data information and provides a solid foundation for subsequent detection algorithms.(2)Building upon the aforementioned research,we propose a feature grouping and data self-representation based outlier detection algorithm,named the Feature grouping and Data Self-Representation based Outlier Detection(FDSR-OD)algorithm.Firstly,we incorporate the balanced association measurement concept into the process of sparse linear combination among data,resulting in a data self-representation matrix that contains both data and feature information.Secondly,we propose a calculation method based on fused inter-group data self-representation,forming a global data self-representation matrix.We further introduce an outlier detection algorithm based on fused data self-representation,detecting outliers through graph random walks on the directed weighted graph formed by the global data self-representation matrix.Finally,we combine the BAFG algorithm to propose the feature grouping and data self-representation based outlier detection algorithm,which effectively improves the accuracy and generalization of the algorithm.(3)Based on the research mentioned above,a system for detecting outliers in astronomical spectra based on feature grouping and data self-representation was designed and implemented using Pycharm as the development tool.The architecture and functional modules of the system were described in detail,and the analysis of its performance demonstrated that it provides an effective approach for large-scale astronomical spectral outlier mining.This paper demonstrates the effectiveness and generalization of the BAFG algorithm and FDSR-OD algorithm using synthetic datasets,UCI datasets,and ODDS datasets.Compared to other comparative algorithms,they exhibit higher detection accuracy.Additionally,when applied to LAMOST astronomical spectral data,they offer a new approach for large-scale outlier mining in astronomical spectra.

Keywords/Search Tags:

Outlier detection, Feature grouping, Data self-representation, Information entropy, Random walk

PDF Full Text Request

Related items

1	Outlier Detection Based On Markov Random Walk And Its Application
2	Research On Outlier Detection Technologies Via Information Entropy Theory In Complex Data Environments
3	Research And Application Of Outlier Detection Algorithm
4	A Study On Outlier Detection Algorithms For High Dimensional Data
5	Research On Outlier Detection Algorithm For High-Dimensional Data Based On Angle And Entropy
6	Outlier Detection Based On Distance And Information Entropy Uncertainty
7	Based On Information Entropy And The Subspace Outlier Mining Algorithm
8	Research On Moving Object Detection And Tracking Based On Sparse Coding Shape Classification And Random Walking Model
9	Research On Outlier Detection Based On Density Difference
10	Outlier Detection Algorithm Based On Entropy Weight Distance And Density Peak Clustering