Research On Data Mining And Short Term Electric Load Forecasting Based On Hadoop

Posted on:2022-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:M X Zhang

Full Text:PDF

GTID:2492306539480134

Subject:Electrical engineering

Abstract/Summary:

PDF Full Text Request

In the context of the rapid development of smart grids,the amount of power data is growing rapidly and the types of data are increasing,and the traditional centralized processing is getting weaker and weaker.In order to better meet the storage,mining and analysis of power data,Distributed electric power big data has received widespread attention.Among them,data mining and load forecasting are complementary to each other as its two application directions.Data mining provides conditions for improving the efficiency and accuracy of load forecasting,and the improvement of load forecasting also feeds back the value of data mining.Based on the combination of data mining and load forecasting algorithm research,combined with Hadoop and Spark big data platform,this paper analyzes the data load characteristics and laws of the "Smart Manufacturing" Electric Power AI Competition,and proposes user clustering based on Mahout and based on Spark Multi-algorithm fusion forecasting model to improve the performance of medium-term load forecasting.The main research contents of this paper are as follows:(1)Starting from the statistical characteristics and laws of the data,statistically analyze the load curves of different users.Firstly,analyze the change law of load from the perspective of time series,including daily,weekly,and monthly load characteristics,and then study the influence of weather,temperature,and date on load characteristics,and obtain the commonalities and characteristics of users in seasonal,periodic,and influencing factors,and A quantitative method is given for special data such as weather data and date data.(2)Construct a medium-term load forecasting model of Xgboost on the Spark platform,and compare them with examples It is verified that Xgboost has advantages in mid-term load forecasting performance compared with RF and GBDT.In order to further improve the model performance,an Xgboost mid-term load forecasting model based on feature selection on the Spark platform is proposed.The importance of random forest features is used to first remove redundant features,then select key features and select several key features for comparison.The results show that the model using key features can significantly improve the performance when the prediction accuracy is slightly reduced.(3)In order to further improve the prediction accuracy,this paper combines Mahout’s data mining K-Means clustering algorithm to propose a medium-term load forecasting method based on ARIMA-GBDT-Xgboost multi-model fusion.K-Means clustering is used to obtain user clusters with different load characteristics.According to the load characteristics of different user clusters,suitable algorithm predictions are selected,and then the results of each algorithm are reconstructed to obtain the final prediction results.Experiments show that this method can Better grasp of the load characteristics of different users,and further improve the prediction accuracy,indicating the effectiveness of the method.And study the processing efficiency of Spark cluster mode compared with stand-alone mode,and verify the advantages of cluster mode for big data processing.In-depth study of the Hadoop core framework HDFS and MapReduce,and indepth study of the Spark and Mahout frameworks in the Hadoop ecosystem.This article fully considers the advantages and disadvantages of different frameworks in the big data ecosystem,and uses the complementary advantages of the three frameworks to reduce the time and cost of experiments.Choosing Mahout as the data mining framework is based on the fact that data mining does not require high real-time performance,and Mahout is implemented based on the principle of MapReduce,which requires low machine hardware.Spark is chosen as the medium-term load forecasting platform,first because it is based on memory calculations,and the calculation speed is fast;second,it is easy to use,because it provides API interfaces in multiple languages,developers can use their familiar language to develop,and get started faster;3.It is Spark that provides the machine learning library MLlib with many built-in machine learning algorithms so that developers can focus on realizing their needs.

Keywords/Search Tags:

Power big data, Medium term load forecasting, Hadoop, Xgboost, Data mining, Spark

PDF Full Text Request

Related items

1	Short-term Load Forecasting Methods Basecd On Big Data Analysis Technology
2	Identifying Bad Data And Power Load Forecasting Of Power System Based On Spark
3	Research On Power System Short-Term Load Forecasting Based On Data Mining
4	Research Of Electric Power Load Forecasting Based On Data Mining Technology
5	Design And Implementation Of Load Forecasting System For Power Dispatching Based On Spark
6	Research On Techniques Of Medium And Long Term Load Forecasting For Power Grid Planning
7	Medium And Long-term Wind Power Forecasting Method Based On Resource Feature Mining
8	Study On Data Mining Application In Short-term Load Forecasting
9	Research On Short Term Power Load Forecasting Method Based On Data Mining Technology In Nanjing Area
10	Data Mining For Short-term Load Forecasting