Font Size: a A A

Performance Optimization Mechanism Of Iterative Applications In Hadoop Environment

Posted on:2015-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:W H JiFull Text:PDF
GTID:2268330431967291Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Iterative algorithm are needed in many modeling process with large datasets, such as data mining, web sorting, analysis of social networks, and so on. Iterative applications like these typically need to be processed with massive data. As one of the distributed computing framework for massive data processing, MapReduce has aroused general concern for its simpleness to programme, high fault tolerance, and easy way to implement and scale. However, there’s some apsects of performance deficiencies if we let iterative application go through the mechanism of MapReduce.(1)In the end of each iteration, a new model is generated with large volume of data, by which caused the network congestion,(2)the static data are read repeatedly through the entire calculation,(3)there’s existence of data and control dependence,(4)an additional task will be needed in MapReduce, when dealing with some iterative applications in which threshold detector is needed.(5)in case of using the programming interface of conventional MapReduce, iterative applications can’t easily be expressed.To solve the aforementioned problem,we propose a type of performance optimization mechanism to process the iterative applications specially by analyzing the operation strategy, scheduling mechanism, and programming model of MapReduce and taking full advantage of the characteristics of the iterative computation. Consequently, we improve the traditional MapReduce mechanism from various angles to support iterative applications more effectively.The main work of this thesis include the following aspects:1、We make a comparison of several frameworks for application with the large data sets in order to discuss the reason why we choose MapReduce as the preferred platform to implement the iterative application. Taking the typical iterative applications for example, we analyse the data and control flow of MapReduce processing iterative calculation. As a consequence we find out the related performance problems.2、In order to reduce the delays caused by the overall linear operating strategy, and relieve the stress upon network bandwidth which is aroused by many-many transmission, we propose local linear operating strategy. We also design the loop scheduling algorithm specifically for caching system, which makes the caching system play a good role to greatest extend. Finally, aiming at improving the speed and efficiency of overall iterative calculation, we replace the linear strategy with parallel strategy.3、At last, we implement some typical iterative applications on the platform we proposed in this thesis with the running time of corresponding algorithm on Hadoop platform as baseline, verifying the mechanism proposed in this thesis has some effect on reducing the intermediate data amount, easing network stress, and improving the speed of iterative calculation.
Keywords/Search Tags:Iterative Applications, Hadoop, Performance Optimization
PDF Full Text Request
Related items