Font Size: a A A

Application Of ETL Component In Distributed Data Mining Engine Based On Hadoop

Posted on:2017-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2308330491451713Subject:Data Warehouse and Decision Support System
Abstract/Summary:PDF Full Text Request
Distributed data mining engine engine system with Internet data quantity unceasing increase, the huge amounts of data is becoming precipitation data mining the value demand growing conditions. Distributed data mining engine system of multiple source data analysis processing, data mining provides the user with the decision information support. From the point of view of the current situation in the field of network data for the world, alibaba by vast users to buy data constantly optimization recommendations and matching rules; Baidu is the recommended by massive amounts of data and the relevant advertising. Distributed data mining engine system has been used in various scenarios and sustainable development.In this paper, from the overall architecture of the distributed data mining engine system and core technology, data warehouse, data mining, entity manager. Mainly introduced the data warehouse and entity manager search system and system, the focus is on basic ETL component design and implementation of the Hadoop platform. ETL mainly tells the story of data preprocessing, data file upload, data extraction, data transformation and data loading into the warehouse, coding, data interface and ETL solve the problem.
Keywords/Search Tags:ETL, Hadoop, Entity, Data warehouse, Data mining
PDF Full Text Request
Related items