| With the continuous development of the new energy industry worldwide,the wind power industry plays an irreplaceable role as an important part of it.As a leader in the new energy industry,China has been increasing its investment in the wind power industry year by year,and with the inclination of government policies and market demand,the wind power industry is booming in China.But the ensuing problem is that it is increasingly difficult to identify abnormal data of wind turbines.Raw wind turbine SCADA(Supervisory Control And Data Acquisition)data contains a considerable proportion reflecting abnormal operation,and the distribution characteristics of the abnormal data generated by different reasons are different.Accurately identifying these abnormal data is the basis for subsequent wind turbine power prediction and generation performance evaluation.Based on the wind power data collected by the SCADA system,this paper proposes a abnormal data identification method based on probabilistic statistical method by analyzing the distribution characteristics of the abnormal data points in the wind speed-power coordinate system and the reasons of abnormal data.(1)Wind speed and power data can reflect the operation status and power generation performance of wind turbines.Firstly,this paper introduces the wind power curve and summarizes the causes and category of abnormal data by analyzing the distribution characteristics of the abnormal data in the wind speedpower coordinate system based on the actual operation data of wind turbines;The stratification method in the horizontal power direction is outlined and the data points are divided into different bins with a certain interval;The advantages and disadvantages of parametric and non-parametric models in probabilistic statistical method are introduced,and abnormal data identification methods that combine the advantages of both models are proposed.(2)The wind turbine operation data are stratified at equal interval in the horizontal direction,and the distribution characteristics of the abnormal data in the horizontal distribution are completely preserved.On this basis,a nonparametric model diffusion kernel density estimation method is propsoed to construct a probability density model for the data points,converting the original data points distribution,which can only be observed manually,into a probability density curve.It is also compared with several other models,and the superiority of the proposed method is demonstrated by two testing methods.(3)The probability density curve is fitted by using the parametric model Mixture Weibull distribution,and the spatial distribution characteristics of complex abnormal data in each power bin are accurately described based on the weight parameter,shape parameter and scale parameter of the Weibull distribution model.The identification of abnormal data is achieved by crossanalyzing the Mixture Weibull model parameters,the distribution of horizontal power bins of various types of abnormal data and the probability distribution curve of each horizontal power bin.Based on this,the average confidence interval method is introduced to clean the abnormal data,and based on this,the serration smoothing method is used to solve the serration problem caused by the traditional data cleaning method using hard threshold intervals to reject the abnormal data. |