This paper discusses about the design and implementation of a general Data Extracting Tool of Data Warehouse. With development of the society and improvement of technology, analysis and decision become the lifeline of every walk of life. Data warehouse provides strong data support for analysis and decision by right of its data storied format and data organize structure. This software provides a solution that ensures the Data Warehouse can get high quality data. It gets the raw data from the data source and sends them into the Data Warehouse after integrating, conversing, cleaning and optimizing.The first chapter expatiates the meaning of this work and gives a brief analysis about data warehouse technology. Chapter 2 to 6 introduce the ideal of the system and its implementing method; the last chapter summarizes the paper and vistas the data warehouse technology in the future.This system was designed into three-tier architecture. We use COM technology to develop middle-level components and use MTS to manage them. We packaged the function modules, such as integration, conversion modules, into COM components. It can do good to update, maintain and transplant the system.In this paper we generalize the common data extracting methods into integration, conversion, clean and sum up, and bring forward data optimizing, such as data smoothness, data standardization and so on, in order to support data mining more effectively.This software can extract data from a majority of structurized or semi-structurized data source, such as relational database, Excel file, formatted text file and XML document. Extracting data from XML document is characteristic of the software. We bring forward a XML circumstance based rule driven method to transform XML data to RDB, and in this base implements extracting data from XML document.System can package the extract work defined by user into 'Extract Package'. These packages can be used more than one time. In order to improve the execute efficiency; we adopt MS DTS as our transform tool. It quickens the data extracting speed. |