Font Size: a A A

Research And Implementation Of Distributed Etl Data Integration System In Electric Power Industry

Posted on:2016-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:J C LinFull Text:PDF
GTID:2308330482479946Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Through several years of development, the power industry has made achievements in the construction of information. Some different systems have been operated stably,Such as the electrical power dispatching SCADA system, power marketing SG186 system, electric energy data acquire system, ERP system and Geographic information system, and they accumulated lots of useful data. But the systems are independent each other, so the data sharing cannot be realized. The distributed data restrict the power company’s decision seriously.In view of the distributed and heterogeneous data sources, the power company needs to construct a grid production management information platform, which can integrate some kinds of heterogeneous data sources, and provides a visual data sharing platform for the decision and analysis. The problem of data integration is more important to the platform. The proposed data warehouse provides a solution for enterprises to integrate the existing data resources effectively.The importance data warehouse plays in decision-making depends on the high quality of data in it. ETL is important to assure that data warehouse has high quality data. It hides the complicated business logic, and offers a consistent data interface to the analysis and application based on data warehouse.Most of traditional ETL tools in the market are expensive and complicated to operate, and have different kinds of complex function with low utilization rate. Further more, they are based on centralized architecture and process ETL design, running,management on the only ETL server which should be high performance. They increase the burden on ETL server.Against the above defects of traditional ETL tools, this thesis put forward a distributed ETL model, which distributes ETL design, management and running to different computers and supports multiple machines to collaborate to design and execute ETL tasks. So it reduces the hardware cost of ETL, and speed up the data processing.The core of this thesis is about distributed ETL model. This model stipulates boundary of every ETL component, the task every ETL component must take charge with and cooperative relations between components. The main components in the model contain job designer based on C/S mode, job scheduling module of ETL, distributedcomputation management, ETL job execution engine and so on. This research contains developing a distributed tool according to the distributed ETL model and we test its performance on the aspect of performance. The method is applied in the data integration system of GPMS platform.This thesis makes a summary of all work in this research and indicates the aspects that can be improved in the future.
Keywords/Search Tags:ETL, distributed system, load balancing, fault-tolerant resumption, GPMS
PDF Full Text Request
Related items