| With the continuous increase in the demand for electric energy in the whole society,there is an urgent need for efficient energy production and distribution of electric power resources.However,the high degree of openness of the Advanced Metering Infrastructure(AMI)provides an opportunity for users to steal electricity,which will cause incalculable non-technical losses to the power grid.disorder and even threaten social security.Traditional machine learning methods are inefficient,and when faced with unbalanced data,it is difficult to accurately mine the characteristics of electricity theft data,which cannot meet the requirements of identifying electricity theft users.Therefore,how to improve the machine learning model and detect electricity theft users has become an urgent problem to be solved.Aiming at the problems existing in the identification of electricity theft users,this thesis makes the following research.For the outliers and missing values in the original electricity consumption data,this thesis uses the 3δ principle and Newton interpolation method to realize the identification of abnormal electricity consumption data and the filling of missing data respectively.The maximum and minimum normalization is performed on the data to make the data meet the requirements of the convolutional neural network.A single over-sampling and under-sampling algorithm cannot meet the requirements of the identification of electricity theft user.In this thesis,a mixed sampling algorithm is used to deal with unbalanced data.Firstly,the oversampling rate in the SMOTE algorithm is dynamically updated according to the error rate of Random Forest(RF),and the E-SMOTE algorithm is obtained.And use the RF detection performance to dynamically adjust the mixed sampling process of the E-SMOTE algorithm and Tome Links,select the AUC(Area Under the Curve)indicator as the iterative stop criterion,and propose a mixed sampling algorithm based on E-SMOTE and Tome Links,repeating the execution of E-SMOTE oversampling and Tome Links undersampling to achieve dataset balance.The effectiveness of the hybrid sampling algorithm is verified by visual verification and performance comparison with the six sampling algorithms.In order to improve the performance of the model for electricity theft detection,this thesis construct an improved C-RF model for electricity theft detection based on mixed sampling.Based on using the hybrid sampling algorithm to balance the power consumption data set,the softmax classifier of the one-dimensional convolutional neural network is replaced with the RF model to realize the organic combination of the one-dimensional convolutional neural network and RF,so as to extract the power consumption data deep features.Based on the traditional RF model,use the feature selection method based on Gini index and chi-square test to collect important power consumption characteristics,and by calculating the mean value of the Q statistic value of each decision tree in the RF model,choose the decision tree with strong diversity for integration.By comparing with the three models,it proves that the C-RF model for electricity theft detection is highly efficient in identifying electricity theft users. |