| Outlier detection is a fundamental topic in robust statistics.The presence of outliers in complex data poses serious adverse effects to the modeling and prediction.Traditional outlier detection methods try to find a clean subset of a given size,which is used to estimate the location vector and scatter matrix,and the outliers can be flagged by the Mahalanobis distance.However,methods such as the fast minimum covariance determinant approach cannot be applied directly to complex dataset,especially when the dimension of the sample is greater than the sample size.In this paper,we first propose two novel fast outlier detection procedure for highdimensional data,which are based on a high-breakdown minimum ridgelized covariance determinant estimator and a block diagonal partition of the sample covariance matrix,respectively.The concentration step in the fast minimum covariance determinant method is redefined and its convergence is proved in high-dimensional settings.The proposed robust estimators are obtained from a clean subset of observations,excluding potential outliers by applying that so-called concentration steps.Then we explore the asymptotic distributions of the modified Mahalanobis distances related to the proposed two estimators under certain moment conditions and Gaussian condition respectively,and obtain theoretical cut-off values for outlier identification.In applications,further improvement in power can be achieved by adding a one-step reweighting procedure.We verify the specificity and sensitivity of our two procedures by simulation and real data analysis in high-dimensional settings.Second,for other types of complex data,such as functional data,a new principal component analysis model based on robust S-estimator is established for outlier detection,which does not need to assume the specific form of the distribution function,and the algorithm converges rapidly.Then the sum of the mean squared residuals which are obtained by adding Tukey’s biweight function constraints is used as the test statistic,and an adjusted box-plot which also has robustness is trained to identify the outliers.In the example,more than 58 thousand measurements of meteorological data over 60 years of 5 cities in Yangtze river basin are adopted.A comparative analysis of this dataset with outlier detecting procedure based on traditional principal component analysis and robust S-estimator has been done.It can be seen that the outlier detection procedure based on robust S-estimator gives more information about the abnormal data,thus it can identify outliers better. |