Font Size: a A A

Research On Data Mining And Short Term Electric Load Forecasting Based On Hadoop

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:M X ZhangFull Text:PDF
GTID:2492306539480134Subject:Electrical engineering
Abstract/Summary:PDF Full Text Request
In the context of the rapid development of smart grids,the amount of power data is growing rapidly and the types of data are increasing,and the traditional centralized processing is getting weaker and weaker.In order to better meet the storage,mining and analysis of power data,Distributed electric power big data has received widespread attention.Among them,data mining and load forecasting are complementary to each other as its two application directions.Data mining provides conditions for improving the efficiency and accuracy of load forecasting,and the improvement of load forecasting also feeds back the value of data mining.Based on the combination of data mining and load forecasting algorithm research,combined with Hadoop and Spark big data platform,this paper analyzes the data load characteristics and laws of the "Smart Manufacturing" Electric Power AI Competition,and proposes user clustering based on Mahout and based on Spark Multi-algorithm fusion forecasting model to improve the performance of medium-term load forecasting.The main research contents of this paper are as follows:(1)Starting from the statistical characteristics and laws of the data,statistically analyze the load curves of different users.Firstly,analyze the change law of load from the perspective of time series,including daily,weekly,and monthly load characteristics,and then study the influence of weather,temperature,and date on load characteristics,and obtain the commonalities and characteristics of users in seasonal,periodic,and influencing factors,and A quantitative method is given for special data such as weather data and date data.(2)Construct a medium-term load forecasting model of Xgboost on the Spark platform,and compare them with examples It is verified that Xgboost has advantages in mid-term load forecasting performance compared with RF and GBDT.In order to further improve the model performance,an Xgboost mid-term load forecasting model based on feature selection on the Spark platform is proposed.The importance of random forest features is used to first remove redundant features,then select key features and select several key features for comparison.The results show that the model using key features can significantly improve the performance when the prediction accuracy is slightly reduced.(3)In order to further improve the prediction accuracy,this paper combines Mahout’s data mining K-Means clustering algorithm to propose a medium-term load forecasting method based on ARIMA-GBDT-Xgboost multi-model fusion.K-Means clustering is used to obtain user clusters with different load characteristics.According to the load characteristics of different user clusters,suitable algorithm predictions are selected,and then the results of each algorithm are reconstructed to obtain the final prediction results.Experiments show that this method can Better grasp of the load characteristics of different users,and further improve the prediction accuracy,indicating the effectiveness of the method.And study the processing efficiency of Spark cluster mode compared with stand-alone mode,and verify the advantages of cluster mode for big data processing.In-depth study of the Hadoop core framework HDFS and MapReduce,and indepth study of the Spark and Mahout frameworks in the Hadoop ecosystem.This article fully considers the advantages and disadvantages of different frameworks in the big data ecosystem,and uses the complementary advantages of the three frameworks to reduce the time and cost of experiments.Choosing Mahout as the data mining framework is based on the fact that data mining does not require high real-time performance,and Mahout is implemented based on the principle of MapReduce,which requires low machine hardware.Spark is chosen as the medium-term load forecasting platform,first because it is based on memory calculations,and the calculation speed is fast;second,it is easy to use,because it provides API interfaces in multiple languages,developers can use their familiar language to develop,and get started faster;3.It is Spark that provides the machine learning library MLlib with many built-in machine learning algorithms so that developers can focus on realizing their needs.
Keywords/Search Tags:Power big data, Medium term load forecasting, Hadoop, Xgboost, Data mining, Spark
PDF Full Text Request
Related items