Outlier Detection For Complex Data Via Robust Estimators

Posted on:2023-09-17

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C K Li

Full Text:PDF

GTID:1520306902959319

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Outlier detection is a fundamental topic in robust statistics.The presence of outliers in complex data poses serious adverse effects to the modeling and prediction.Traditional outlier detection methods try to find a clean subset of a given size,which is used to estimate the location vector and scatter matrix,and the outliers can be flagged by the Mahalanobis distance.However,methods such as the fast minimum covariance determinant approach cannot be applied directly to complex dataset,especially when the dimension of the sample is greater than the sample size.In this paper,we first propose two novel fast outlier detection procedure for highdimensional data,which are based on a high-breakdown minimum ridgelized covariance determinant estimator and a block diagonal partition of the sample covariance matrix,respectively.The concentration step in the fast minimum covariance determinant method is redefined and its convergence is proved in high-dimensional settings.The proposed robust estimators are obtained from a clean subset of observations,excluding potential outliers by applying that so-called concentration steps.Then we explore the asymptotic distributions of the modified Mahalanobis distances related to the proposed two estimators under certain moment conditions and Gaussian condition respectively,and obtain theoretical cut-off values for outlier identification.In applications,further improvement in power can be achieved by adding a one-step reweighting procedure.We verify the specificity and sensitivity of our two procedures by simulation and real data analysis in high-dimensional settings.Second,for other types of complex data,such as functional data,a new principal component analysis model based on robust S-estimator is established for outlier detection,which does not need to assume the specific form of the distribution function,and the algorithm converges rapidly.Then the sum of the mean squared residuals which are obtained by adding Tukey’s biweight function constraints is used as the test statistic,and an adjusted box-plot which also has robustness is trained to identify the outliers.In the example,more than 58 thousand measurements of meteorological data over 60 years of 5 cities in Yangtze river basin are adopted.A comparative analysis of this dataset with outlier detecting procedure based on traditional principal component analysis and robust S-estimator has been done.It can be seen that the outlier detection procedure based on robust S-estimator gives more information about the abnormal data,thus it can identify outliers better.

Keywords/Search Tags:

Outlier detection, High dimension, Robust estimator, Minimum covariance determinant estimator, Random matrices, Block diagonal, Principal component analysis, Dimension reduction

PDF Full Text Request

Related items

1	Imputation Methods Of Missing Values For Compositional Data
2	Research And Visualization Of Dimension Reduction Method Of Gene Expression Data Based On Principal Component Analysis
3	Robust Dimension Reduction Based On MCD Method In Sufficient Dimension Reduction
4	Dimensionality Reduction Method In High-dimensional Data Analysis
5	A Research On Dimension Folding Reduction Method Based On Longitudinal Data And Its Case Study
6	New Shrinkage Nonlinear Estimators In Linear Regression
7	Research On Biased Estimators Of Parameters In Linear Model
8	Improvement Of Parameter Estimators In Seemingly Unrelated Regression Model
9	Semi-Parametric Polynomial Inverse Regression For Dimension Reduction And Its Application
10	RDS Free Central Limit Theorem For Distant Spiked Eigenvalues Of Covariance Matrices And Its Applications