Font Size: a A A

Research On MapReduce Job Performance Optimization In Electric Power Collection System

Posted on:2017-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:K H ZhangFull Text:PDF
GTID:2272330485982058Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Electricity information collection system calculates daily success rate of collection, online rate of terminal, the passing rate of line loss, collecting data integrity and so on. These jobs which have a large number of input data must be executed with deadline, and Hadoop is an effective solution for it. However, Hadoop which was used in electricity information collection system has a lot of problems.Initially, Hadoop was designed to process massive-scale datasets and run large-scale batch jobs. However, Hadoop which was used in electricity information collection system runs frequently a large number of short jobs, which leads to often request and release resources. Electricity information collection system was deployed on heterogenous machines. Since abnormal nodes, overload of nodes and data skew lead to stragglers, which increase the runtimes of jobs. Line loss rate and the success rate of upload code depend on archive jobs which must be executed within deadline. However, Hadoop has not the characteristic of scheduling jobs within deadline.These problems and deficiencies mentioned above seriously affect the efficiency of jobs and user experience. This Paper addresses the problem of short jobs, straggler and time-constraint job’s scheduling from the aspects of computing framework, the process of task execution as well as job scheduling mechanism. The main contributions are as follows.(1) The paper depicts the process of requesting and releasing resources, Hadoop framework and the process of task execution. The reasons why Hadoop is not good at running short jobs are presented. In order to estimate the running time of the task, the paper established a task performance model. Based on the resource reuse mechanism, the paper optimizes the process of executing short jobs to reduce the completion time ofjob.(2) The straggler recognition model is proposed according to the progress rate of the task. Combining the progress of task with the amount of resource used, we use different strategies to address straggler issue. In addition, a new task scheduling algorithm is presented to ensure that cluster resources can be share fairly between jobs.(3) For time-constraint job, the paper uses divide-and-conquer strategy to break down the tasks. If a phase of the job is not completed within the specified time, it uses preemptive strategy to obtain resources.When the problems mentioned above are fixed, Hadoop is deployed on 32 nodes. And the runtimes of line loss rate, collection success rate and terminal online rate are significantly shortened from 60-120 minutes to 20-40 minutes.
Keywords/Search Tags:MapReduce, Short Jobs, Straggler, Task Scheduling, Performance Optimization
PDF Full Text Request
Related items