Font Size: a A A

Research On The Parallel Data Mining System Based On Power Big Data

Posted on:2018-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:W C LinFull Text:PDF
GTID:2382330542487908Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of informatization in the power industry,it is an important direction for the power industry to promote the development of the power industry to analyze and utilize the massive data in all aspects of the power industry to explore the huge commercial value and social value contained in the power data.Utilizing the data of production,operation and management of electric power enterprises,analyzing of the data on the characteristics of electricity consumption,power load,fault prediction and maintenance,and electricity analysis will contribute to improve the planning and design of power distribution network,the safety monitoring and maintenance strategy optimization,customer power behavior analysis,power management decision-making and other fields,to promote the power enterprise business development and management level,to promote the optimal allocation of power resources,to provide efficient service for customers.Power big data has many characteristics such as large data volume,many data types,low value density and fast processing speed.The existing traditional business intelligence system of power enterprise is limited by the data processing capability,lacking in parallel data mining.and limited to structured data analysis,has been unable to meet the power enterprise data analysis needs.It is necessary to design a new type of parallel data mining system to analyze and mine the massive electric power data effectively and serve the power industry.The contents of this paper mainly include the following aspects:The demand of parallel data mining system based on power large data is analyzed,and the overall design and hierarchical architecture design are carried out.A detailed design scheme is proposed for unified model for parallel data analysis and mining services,parallel data preprocessing,parallel data analysis and mining workflow.Through the combination of modules of the data processing part of the analysis and mining layer,a whole data mining process is completed.The system integrates various parallel computing platforms and frameworks,has a built-in data mining process designer,and provides pre-designed operators,delivers to users as service.Users can perform data analysis and mining without the need to write any code or install the platform client locally,use the workflow design,data display,log management,process management and other functions,so it brings great convenience for power data analysis professionals.Based on the parallel data mining system,this paper designs improved data mining algorithms and applies it to the power data,to provide the alternative method for solving problems of feature selection and clustering.In this paper,a new algorithm is proposed for feature selection in data preprocessing.The algorithm uses joint mutual information and joint information entropy as the evaluation measure of feature relevance,mutual information is used as the evaluation measure of feature importance.The method also introduces clustering method to feature selection,to select important features while eliminating redundancy.In the aspect of clustering analysis,this paper aiming at the characteristics of different hierarchical clustering methods,combines top-down and bottom-up hierarchical clustering method,and puts forward the definition of a density,thus proposes a hierarchical clustering method based on density,which overcomes shortcomings of single direction hierarchical clustering method,improves the clustering quality,simplifies the selection of parameters.
Keywords/Search Tags:power big data, parallel data mining, visualization, feature selection, clustering
PDF Full Text Request
Related items