| As the command center of power grid operation,electric power dispatching automation system plays an important role in power generation,transmission,transformation and distribution.Once the dispatching control system is abnormal,the stable operation of power grid system will be greatly affected.Considering several characteristics of the business data of power dispatching automation system,such as large volume of data,numerous characteristic dimensions and the complex pattern of data distribution,the current method,setting a threshold for a single state,does not take into account the relationship between the indicators,and cannot adapt to the complex pattern of data distribution,which makes the outlier detection accuracy low.Although data-driven outlier detection methods based on machine learning can improve the performance to a certain extent,the existing methods also have some problems such as low detection accuracy and unstable detection performance in the case of complex data distribution.Therefore,this paper studies the outlier detection methods under the multi-pattern distribution of business data of power dispatching automation system.The main work of this paper is as follows:Firstly,a supervised outlier detection method for dispatching automation system considering the clustering distribution of outliers is studied.Aiming at the characteristics of unbalanced sample numbers and small clusters of multi-pattern data in labeled dataset,an over-sampling and unbalanced classification method based on nearest neighbors searching and clustering is proposed.By calculating Euclidean distance,k nearest neighbors of a minority class sample in the feature space are found.Whether the sample is the noise or belongs to a minority class cluster is determined by the label of the nearest neighbors,and the neighbors of these neighbor samples is searched iteratively until no more minority class samples belonging to this cluster are found.Then,filter the noise and calculate the number of minority samples that need to be generated in each cluster.Based on this,SMOTE is used to synthesize new samples in each cluster to balance the dataset.The balanced dataset is classified by random forest model,which can adapt to the problem of multiple feature types.Secondly,an unsupervised outlier detection method for dispatching automation system considering multi-pattern distribution is studied.In order to solve the problem that normal data have clusters of different densities and abnormal data have multi-pattern outliers such as global,local and clustered outliers,an outlier detection method based on densitydistance decision graph is proposed,which combines local and global information.The local density of the sample is calculated by combining kernel density estimation and local reachable distance,and the density ratio of the sample to the nearest neighbor is calculated as the local outlier degree.Then,the definition of density lifting distance is given.The global outlier degree of each sample is measured by the weighted sum of the distances from k nearest samples whose density is larger than its own.The density-distance decision graph was drawn based on local density ratio and density lifting distance,and the product of these two measures was used as the final outlier score to detect both local outliers and clustered outliers.Finally,an outlier detection method robust to parameter selection under multi pattern data distribution for dispatching automation system is studied.A robust outlier detection method based on the rate of change of directed density is proposed to solve the problem that the existing methods based onk-nearest neighbors are sensitive to parameter selection.The local density of samples is calculated by extended neighbor set and kernel density estimation,and the directed density ratio is defined based on the local density ratio of samples to their k-nearest neighbor samples and the vector between corresponding samples.Based on this,the local information of samples can be better measured under different local densities and distribution manifolds.Then,by increasing the size of k nearest neighbors,the change of directed density ratio of samples is calculated and the outlier scores of samples are accumulated.The proposed method can adaptto the distribution of different data patterns and is robust to parameter selection. |