Font Size: a A A

ETL-based Solution For Integrating Data

Posted on:2011-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:C ChangFull Text:PDF
GTID:2178360302474604Subject:Computer Science
Abstract/Summary:PDF Full Text Request
With globalization, more and more companies decide to move its business to other country. Globalization is modern trend, so companies need modern distributed information system to support its business. Thus "Information Island" has become a problem in many company information system infrastructures. System cannot contact each other. Comparing with "Information Island", data integration is more trickiness problem. And the real-time data integration is now another trend. Considering data integration, we have the following technical problems which need to be resolved. 1) How to extract data from distributed system with different format of source data and how to parse the data and transform the data into the same data format. 2) How to clean the data to make the data meet business requirement. 3) How to invoke the data integration process with schedule event and real time event.In order to integrate the distributed data frequently with high performance, an ETL (Extract-Transform-Load) based data integration Model was proposed. Different interactive solutions have been analyzed and abstracted to basic solution, and different data parse solutions have been abstracted to basic solution. With BeanFactory from Spring, we can construct wrappers for different data sources by combining the interactive solution and parse solution using dependency injection. Besides, business rules have been studied and integrated into this model to get business rules as plug-in transform solution. On-demand request and meta-data mapping can shield the difference between the structured data and un-structured data. Moreover, the performance issue has been analyzed and the incremental integration was introduced to extract data source in an incremental way. Based on practical project, traditional point to point integration was compared to this model to illustrate this model can provide high performance and easy to be implemented.
Keywords/Search Tags:ETL, Data Integration, Data Feeder, Metadata Mapping, business rule transform, on demand Request
PDF Full Text Request
Related items