Research And Application Of Outlier Detection Method Based On Nearest Neighbor

Posted on:2023-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:S N Song

Full Text:PDF

GTID:2558307094488074

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Outlier detection is one of the tasks of data mining,aiming to discover some unknown but valuable knowledge or patterns,and its related techniques have been widely used in all aspects of life.In order to detect outliers in different datasets better and faster,many outlier detection algorithms have been proposed by researchers and scholars.However,most of these algorithms are affected by irrelevant attributes and cannot detect both local outliers and global outliers.i Forest algorithm is an integration-based and efficient classical outlier detection algorithm,which also has these shortcomings.In this work,we improve the algorithm to address these shortcomings and validate the effectiveness of the improved algorithm on stellar spectral datasets.The main research contents are as follows.(1)An outlier detection algorithm based on nearest-neighbour related attribute isolated forests is proposed to address the problem of more redundant attributes and greater randomness in the process of constructing isolated trees by i Forest,which leads to unstable efficiency of the algorithm.The algorithm is divided into two stages: in the first stage,irrelevant attributes are found and removed through the dense region of attributes to achieve a reduced set of attributes;in the second stage,root node samples are selected by assuming a Gaussian model approach,and the distances of nearest neighbours in nodes are averaged as node cut points,reducing the instability caused by the random selection of root node samples,cut attributes and attribute cut points in the original algorithms;finally,a new outlier score is proposed based on the degree of difference between the data samples and each other’s nearest neighbours,characterising the degree of outliers of each data point.After theoretical analysis and experimental results,the Nforest algorithm is shown to be more stable and more accurate.(2)To address the problem that i Forest cannot detect both global and local outliers,an isolated forest(WDNForest)outlier detection algorithm based on weighted clustering of attributes is proposed.The algorithm is divided into two stages: the first stage is to construct the isolation tree,firstly,the attributes are weighted according to the importance of the attribute features and k-means clustering is performed on the weighted data,then,the local density and relative distance of each data point are calculated using the DPC algorithm,finally,the isolation tree is constructed according to the different measures of density and distance of different data points;the second stage,the isolation tree is constructed After that,the local outlier function and global outlier function are defined according to the different densities and distances of normal data points and outlier data points respectively,and the outlier value is calculated to detect outlier data.After the experimental analysis on different UCI datasets,the results show that the WDNForest algorithm can detect both global outliers and local outliers,and the algorithm has greater advantages compared with other algorithms in terms of precision and recall.(3)Based on the above study,the improved algorithm was applied to the spectral data to further validate the effectiveness of the algorithm.The outlier analysis of the pre-processed spectral data was carried out using the NForest and WDNForest algorithms,respectively.The experimental results of both algorithms show that the algorithms have good outlier detection performance and can detect rare targets in the spectral data,providing an effective way to explore unknown,rare targets in specific contexts.

Keywords/Search Tags:

Outlier detection, Approximate clipping properties, Nearest neighbours, Clustered outliers, LAMOST, Astronomical spectra

PDF Full Text Request

Related items

1	Research On Clustering Of LAMOST Stellar Spectra Based On Line Index
2	Fast Approximate K Nearest Neighbours Search Based On Set Compression Tree And Best Bin First
3	Research On Some Issues In Support Vector Machines
4	Compression For 2-D Spectra Images Based On LAMOST
5	Automated 1D Spectral Processing For LAMOST
6	Researches On Abnormal Data Detection Algorithms With Adaptive K-Nearest Neighbor
7	Research And Implementation Of Approximate Algorithm For Outlier Detection Based On Probability Model
8	Improvement Of Density-Based Local Outlier Detection Algorithm
9	Applications Of Wavelet To LAMOST Spectrum Processing
10	Study And Improvement Of Local Outliers Mining Based On Density