Font Size: a A A

Research On An Approach For Handling Tasks With Cooperation Of Multicluster In Big Data

Posted on:2017-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:C Y WuFull Text:PDF
GTID:2308330482995748Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, the problem of analyzing and processing big data tasks is gradually showing its important position in the domain of big dataresearch. In recent years, many effective big data processing approaches are proposed in domestic and foreign research work. Aiming at solving the problem of big data task analyzing and processing effectively and quickly, using big data task deployment strategy to allocate tasks is an important method. A fast and efficient task deployment strategy can make the overall processing system allocate tasks reasonably and utilize resources efficiently, which ensuring the collaboration of each task processing nodes in cloud computing environment, thus achieves the purpose of providing good service for users. It is important to note that the previous method that using a single cluster to process big data tasks has already become difficult to meet the needs of mass data processing in the large-scale challenging application. Taking advantage of the collaboration of multiple clusters to process big data tasks has become an effective method. Currently, in order to research the big data task deployment strategy, load balancing and resource optimization problem has become a hot issue. And this paper will resolve the problem of load balancing and the optimization of communication resource among clusters in the overall cluster set simultaneously.In the process of solving this problem, realizing load balancing of the overall cluster set is beneficial to enhance resource utilization rate and promote the collaboration of task processing nodes, thus improve overall system performance. In addition, optimizing communication resource among clusters contributes to reduce extra bandwidth resources consumption due to data transmission in multicluster and decrease the data delay to a certain extent, thus makes tasks be processed quickly and efficiently. Therefore, this paper focuses on optimizing the multi-objective combinatorial optimization problem of the load balancing of multicluster and the minimization of bandwidth resource cost for the communication among clusters. In order to solve this multi-objective optimization problem, under the premise of ensuring the maximum quantity of successful task deployment, the load balancing rate of overall cluster set has been designed and employed to measure the load balancing degree, and relevant prediction algorithm has been used to calculate the communication bandwidth resource consumption in multicluster when different task deployment schemes are adopted. A novel approach for fast acquisition of the best task deployment scheme set is proposed in this paper, which is based on the architecture of processing tasks by cooperation of multicluster in big data. The specific work is as follows:(1) The research background and significance of this paper is detailed, and then the current research circumstances of big data task processing in the domestic and overseas is introduced, finally their advantages and disadvantages are analyzed in brief.(2) Firstly, the definition and characteristics of big data are introduced. Then a brief description of the Master-Slave is given. In addition, the multi-objective combinatorial optimization problem and Pareto dominance theory are introduced. In the end, we introduce the particle swarm optimization and Cauchy mutation which are employed in this paper.(3) We carry out a detailed description of the architecture and implementation of the proposed approach in this paper. And then,aiming at processing tasks quickly and efficiently in big data, a novel heuristic approach Handling Tasks with Cooperation of Multicluster(HCMC) is presented. This paper gives the detailed design scheme and implementation of HCMC. The proposed approach is able to deploy big data tasks quickly and efficiently, thus improves the performance of task processing.(4) The performance of HCMC is evaluated through the simulation experiments. The simulation results has shown that compared with previous work, the proposed HCMC approach can not only effectively reduce the failure number of task deployment and increase the throughput of the multicluster, but also improve the external services performance of multicluster.
Keywords/Search Tags:Big Data, Multicluster, Load Balancing, Pareto Dominance, Cauchy Mutation
PDF Full Text Request
Related items