Font Size: a A A

Research On Outlier Detection Algorithm And Its Application In Celestial Data Processing

Posted on:2022-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y BaFull Text:PDF
GTID:2480306350494184Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Outlier detection is an important part of data mining.Its primary task is to find data objects or events that are significantly different from most normal data or events,so as to provide data support for further in-depth research and analysis.The LAMOST telescope has been in operation for nearly 10 years,and has detected and collected tens of millions of spectral data.It is an important cornerstone for the construction of the "digital galaxy",and has irreplaceable scientific significance for the study of the structure.It has shown great advantages in helping astronomers search for rare and compact celestial bodies as well as studying stellar physics and exploring the distant universe.In this paper,the outlier detection algorithm and astronomical spectral data are studied and analyzed,and the verification experiments of relevant algorithms in astronomical data processing and special celestial body search experiments are carried out.The main work of this thesis is as follows:(1)It introduces the knowledge of outlier detection algorithms.It includes definition of outlier,correlation algorithm of outlier detection and outlier application and so on.(2)An MD-LOF algorithm is proposed and applied to search for special celestial bodies.Firstly,the spectral data were preprocessed by denoising,redshift and normalization.It uses PCA method to construct the supernova feature space.In this chapter,a local isolation factor MD-LOF algorithm based on mixed distance is proposed.It is based on the LOF algorithm.It defines MD-LOF by giving different weights to the Euclidean distance ?Manhattan distance and Mahalanobis distance,and then optimizes the anomaly detection results.The experimental results show that the effect of MD-LOF is better and more stable.(3)On the basis of the isolated forest algorithm,an outlier threshold function based on Cantelli's inequality is added,and it uses lasso regression function to form a sparse regression function model to iterate on each other and refine feature selection and outlier detection.It is tested on both public and spectral datasets.The results showed that its AUC performance was better,and it has some effect in searching for particular objects.But there is still room for improvement in accuracy and stability.
Keywords/Search Tags:Outlier detection, Special object mining, MD-LOF, Sparse regression model, Isolated forests
PDF Full Text Request
Related items