Font Size: a A A

Research And Development On Big Data Application Of The State Grid Audit System

Posted on:2018-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z XuFull Text:PDF
GTID:2359330518455520Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With fast promotion of information technology in power systems,the volume of data acquired every day is growing rapidly,and a diversity of data from different sources can reach the level of PB in size.Facing these challenges it is imperative to research on big data applications in power systems and develop power system oriented big data analysis platforms.This paper first reviews one of the time consuming data calculations in the current state grid auditing system,and then implements corresponding computations through a pilot hadoop development environment to verify viability of big data applications in the state grid aud iting system,and therefore proposes a big data solution to optimize the calculations in the auditing system.The pilot environment consists of a 15-node hadoop cluster,data are tranferred by sqoop into the Hive data warehouse over the hadoop distributed file system in the cluster.A mass data query test is conducted by separately using Hive QL and Spark SQL to perform a set of specified queries over the same huge dataset within the distributed processing framework of Map Reduce aiming at facilitating massive data query and analysis.The test result shows that the Hadoop distributed architecture has good scalability to meet the needs of rapid growth of data processing in the state grid auditing system.It also shows that the more data,the more obvious advantage is,and the higher efficiency Spark query than that of Hive.As a key class of algorithms in data analysis and data mining,clustering analysis has been widely used in many fields.For the seek of a holistic auditing optimization of thinking-contents-objectives-technology application and turning a traditional verification auditing into a risk based preventive auditing,clustering algorithms are to have a huge playground.In front of ever-growing data,K-means as the most widely used partitional clustering algorithm in practice,and Hadoop as a widely used parallel computing model nowadays,both are very attractive to researchers and developers.It makes sense to find out a better way to implement K-means using parallelization of Hadoop platforms.This paper summarizes principles of K-means algorithms together with Map Reduce distributed computing model and put forth a Java implementation of K-means algorithm on hadoop Map Reduce.Through algorithm correctness validation,cluster acceleration evaluation an d cluster expansion rate verification,this paper confirms that the improved K-means algorithm,besides its highly efficiency and expansibility,can effectively make use of powerful parallel computing capability of Hadoop platforms thus it can be used in d eveloping a more intelligent state grid auditing system in the future.
Keywords/Search Tags:Power Big Data, Intelligent grid, distributed storage, parallel computing, Auditing, clustering, K-means
PDF Full Text Request
Related items