| Cloud computing has been successful in the business environment, large data market at home and abroad have also unusually fiery. Facebook, Baidu, Alibaba and other enterprises are mining and analysis of user behavior from their massive user data and guide the enterprise decision. How to find users’ interested information in mass data qucikly, the traditional method has been difficult to deal with, but distributed platform can do easily. In many distributed platform, Hadoop is one of the most widely used and high efficient open source distributed platform at present. Hadoop cluster by connecting a large number of machines formed tremendous computing power to process the data and the user data is represented in the form of job, then the job were cut into different tasks assigned to different machines to be executed. Assigning different tasks to right machines can speed up the processing speed, improve resource utilization. All of them are the function of job scheduler.Firstly this paper bases on the study of existed scheduling algorithms in Hadoopplatform, carries on the analysis from the aspects of the allocation of resources, putsforward a job scheduling model based on no categories slot and focuses onimproving the utilization ratio of resources. Then the model is applied to a newscheduling algorithm——IS scheduling algorithm, aimed to improve data localitytasks, reduce resources consumption that data block scheduling process transmissionbrings, finally this paper combines IS scheduling algorithm with the basic geneticalgorithm, puts forward CHC scheduling based on genetic algorithm for multi userconditions, improves the total time scheduling and average time. Finally, through thedesign of relevant scheduler, build the experiment environment to verify CHCalgorithm in the rate of the upgrade by the scheduling performance and resource.The main work of this paper:First, Hadoop existing resource representation model and allocation models arebased slot is configured and strict distinction between static map slot and reduce slot;the paper presents a classless slot scheduling model will be based on the number ofslot resources the use of dynamic changes in the job scheduling process and nolonger distinguish between the type of slot, and improve load balancing and resourceutilization.Second, data locality is highlighted in the case of Hadoop job scheduling considerations, but this is not the emphasis of the classless slot model. Therefore in the scheduling algorithm based on master-slave node interaction(IS algorithm), focus on the data locality for consideration, and reduce the excessive data transmission system loss brought.Third, in the practical production, multi-user scheduling is the key, so based on genetic algorithm scheduler do appear. In order to improve the job execution time and the average execution time, this paper combines the IS scheduling algorithm and the genetic algorithm, puts forward the scheduling of CHC based on genetic algorithm, It has a very good task parallelism with using a new coding mode and multi objective scheduling function, through experimental verification, the CHC scheduler show very good effect in the using efficiency of scheduling time and resources utilization. |