Font Size: a A A

Research On Stealing User Behavior Based On XGBoost Algorithm

Posted on:2019-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:X Y SunFull Text:PDF
GTID:2382330548967872Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Electrical energy is an essential source of energy for social production and people's lives.Under the drive of interests,however,some businesses and individuals frequently use unfair methods to steal electricity,which brings huge economic losses to electric power enterprise,and also affects the stability of power grids and the normal social order.Therefore,the electricity stealing behavior has always been the focus of the work of electric power enterprises.The traditional electricity stealing detection only relies on simple statistical methods and the on-site investigation of manpower.This method not only consumes a large amount of manpower and material resources,but also the accuracy of the inspection is not high.In recent years,the rapid development of big data technology has received wide attention from all walks in society.With the application of new smart electricity meters and information acquisition systems,the electric power enterprise has been able to collect large amounts of electricity information.The data have laid a solid foundation for the development of power large data technology,making it possible to use big data technology for stealing user identification detection.In order to solve the problems of single identification model,low operating efficiency and rarely involving unbalanced data sets in the identification of electricity theft users.In this thesis.In this thesis,a large data analysis method is used to construct an integrated algorithm based recognition model.Under the two kinds of data sets,the unbalanced data set and the balanced data set,the user is identified.The purpose of this thesis is to improve the accuracy of identification and the efficiency of model operation.In this thesis,first of all,the original data set is explored.The exploration of the data mainly starts from three aspects: data structure,data quality and data characteristics.Data exploration is presented as forms and charts,in order to find the valuable information in the data set and lay the foundation for the following work.Secondly,the missing value of the data set is processed on the basis of data exploration to ensure the completeness and accuracy of the data set.Then,in order to obtain the key information of user identification of electricity stealing,the features of the data set are extracted from three aspects: basic attribute feature,statistical feature under different time scale and similarity feature under different time scale.The characteristic matrix of 411 dimension is obtained.Then,an electricity stealing recognition model is constructed based on XGBoost(eXtreme Gradient Boosting)algorithm,and user features are put into the model to classify and identify.In order to prevent the overfitting of the model,the experiment adopts 5-fold cross-validation.Finally,in order to evaluate the recognition effect,the decision tree algorithm and the random forest algorithm are used to construct the recognition model,the electricity stealing users are identified under the same eigenvalue.The confusion matrix and ROC curve are used to evaluate the experimental results.The experimental results show that the recognition model based on XGBoost algorithm is superior to that based on the other two algorithms in recognition rate and running efficiency under two different data sets.
Keywords/Search Tags:Electricity Stealing Users, Classification and Identification, Classification algorithm, XGBoost
PDF Full Text Request
Related items