Font Size: a A A

Application Research Of The Performance Optimization For Map Reduce Model In Hadoop

Posted on:2016-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:R R LuFull Text:PDF
GTID:2308330473964457Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of Internet promotes data’s geometric growth, which manifested in data space growing from GB to TB even PB scale. The rapid growth of data scale identifies that the age of big data is coming and big data’s impact to human can’t be underestimated. However, efficiency of users digging out useful information from explosive data becomes lower. In this case, calculating the similarity with users’ data can solve the problem to some extent, thereby extracting relatively popular and useful data.Hadoop is the most widely used open source cloud computing platform currently. Parallel programming model MapReduce is one of the key technologies of cloud computing. However in the actual application process, issues like existence of a large amount of temporary data and Reduce tasks’ uneven distribution during the MapReduce model calculation process will lead to low system resource utilization rate.For MapReduce model’s application, this thesis proposes the similarity calculation of massive data using MapReduce’s distributed programming advantage and improves MapReduce programming model performance from two aspects, which are I/O operations and load balancing strategy. To alleviate disk blocking and network congestion triggered by large amount of I/O operations effectively, this thesis utilizes Stripe algorithm and SStripe algorithm to realize local aggregation with intermediate results after Map stage. With the improved load balancing algorithm based on Balance strategy, MapReduce model can evenly distribute results from local aggregation to reduce network overhead caused by Reduce nodes’ uneven load. This thesis sets up an experiment platform based on Hadoop and validates optimized I/O algorithm and advanced load balancing strategy’s feasibility and effectiveness in similarity calculation process with a series of comparative experiments.
Keywords/Search Tags:Performance Optimization, MapReduce, Hadoop, Similarity Calculation
PDF Full Text Request
Related items