Font Size: a A A

Optimizing Multi-Join In Cloud Environment

Posted on:2014-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:M X ZhouFull Text:PDF
GTID:2248330398976768Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The age of big data speeds up the development of the cloud computing, so the cloud platforms emerge in large numbers. Cloud computing technology has been gradually developed, and uses to people’s life, industry and research and so on. MapReduce is a parallel programming model which has been widely applied to the cloud computing data processing. The question of how to make MapReduce support complicated relational database processing has got much attention from many business and academic researchers.There are more and more complex processing requirements in large-scale data analysis because data connection and querying information always across multiple data set. The existing MapReduce-based multi-table join most likely be implemented in the way of serial connection. That is the multi-table joining can be divided into cascaded2-table joins. It will produce an abundance of intermediate data and need multiple data transmission. Therefore, to improve the multi-table join has become a hot issue in the field of MapReduce-based data processing research.This paper first introduces some related technologies, including cloud platform, Hadoop, HDFS and the MapReduce programming model. Based on the analysis of the multi-table join mechanism in cloud environment and the research of MapReduce concurrent multi-join, this paper proposes a two-dimension Reducer matrix based hierarchized multi-join model (TD-HMJ). TD-HMJ handles all "Key" attributes in one Map phase. In Reduce phase, TD-HMJ implements several groups of three or two tables join by establishing a two-dimension Reducer matrix, and finishes the joining between groups through multiply Reduce processes. Experiment results show that TD-HMJ decreases the data transmission, curtails the time of multi-join, and increases the system efficiency.
Keywords/Search Tags:MapReduce, mass data, cloud computing, multi-join
PDF Full Text Request
Related items