Font Size: a A A

High-Dimensional Outlier Detection And Application Based On Local Coulomb Resultant Force

Posted on:2023-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:P Y ZhuFull Text:PDF
GTID:2558307094985249Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Outlier detection is the main research content in data mining and similarity measure of high-dimensional data is one of the challenging problems in it.Traditional outlier detection methods cannot adapt to high-dimensional data analysis due to the interference of the phenomenon known as the Distance of "tendency to concentration" or "curse of dimensionality".In the thesis,inspired by Coulomb’s law,the high-dimensional outlier detection method and celestial spectral outlier data mining technology based on local Coulomb resultant force are in-depth studied,which effectively alleviates the "curse of dimensionality " interference in outlier detection,and provides a new idea and method for highdimensional outlier detection.Its main research work and innovations are as follows:(1)A high-dimensional outlier detection method is proposed by using local Coulomb resultant force.First,inspired by Coulomb’s law,the data objects are regarded as the charges in the electrostatic field,and the attribute dimensions are regarded as the axes of the Cartesian coordinate system,and then a new highdimensional data object similarity measure vector is proposed——the outlier Coulomb force and the outlier Coulomb resultant force.The outlier Coulomb force can fully reflect the similarity of data objects in various dimensions.The outlier Coulomb resultant force reduces the influence of the attribute dimension with a low degree of deviation,and enhances the influence of the attribute dimension with a high degree of deviation.The higher the dimension,the richer the deviation information.Secondly,a new local outlier factor is defined,and a high-dimensional outlier detection algorithm based on local Coulomb resultant force is presented.In the end,using UCI and synthetic datasets,experimental results validate that the algorithm can effectively alleviate the interference of the "curse of dimensionality" and adapt to the task of high-dimensional outlier detection.(2)An outlier detection method for high-dimensional categorical data is presented by using local weighted coulomb resultant force.Firstly,a new categorical attribute weight calculation formula is defined by using the concept of complement entropy,which effectively characterizes the importance of attributes.Secondly,combined with the advantages of the size and direction of the outlier Coulomb force vector,a weighted Coulomb force similarity measurement method for high-dimensional categorical data is defined,and a high-dimensional categorical outlier detection algorithm based on local weighted Coulomb resultant force is proposed.In the end,experimental results validate that the algorithm has a good outlier detection effect on a real high-dimensional categorical datasets.(3)A prototype system for detecting outliers in high-dimensional celestial spectra objects based on local Coulomb resultant.Firstly,according to the detection task of celestial spectral outlier data,the processing flow chart of the prototype system is given,as well as functional modules such as celestial spectral data input,data preprocessing,and outlier data mining.Secondly,the prototype system is designed and implemented by using Python 3.7 as a development tool.In the end,running results on the LAMOST DR5 v3 dataset of astronomical spectra show that the prototype system can effectively detect the spectral data objects of rare celestial bodies,thus provides an effective new way to discover the spectral data of unknown or rare celestial bodies.
Keywords/Search Tags:High-dimensional outlier detection, Similarity measure, Outlier Coulomb force, Local Coulomb resultant force, Categorical attribute weights, Local outlier factor, Astronomical Spectral Data
PDF Full Text Request
Related items