Research of Hadoop Job Scheduler Algorithm Based on Task Characteristics and Fair Strategy
|School||Huazhong University of Science and Technology|
|Course||Applied Computer Technology|
|Keywords||Hadoop Job scheduler Data locality Time slice|
In recent years, no matter in research or in the application, Hadoop has madetremendous development. As one of the core technology of the Hadoop platform, Jobscheduling technology’s purpose is to schedule the jobs accurately and allocate thecomputing resources legitimately. Job scheduling algorithm will directly affect theperformance of the entire platform.Currently, there are certain defects in the Hadoop Scheduling algorithms. FIFO isdesigned for a single-user, ignores the difference of different jobs. Capacity Schedulercan’t set the queue and select right group for the jobs automatically. Fair Scheduler doesn’tconsider the load of the current system.Base on the disadvantages of the existing algorithms, the paper proposed a newalgorithm which is named the similar time slice algorithm based on local data. Thealgorithm calculates the priority of jobs and the data locality, solves the hierarchy betweendata locality and priority by setting an avoidance threshold, meets the priority and job datalocality, and ensures tasks run in parallel by using the round-robin. It chooses the best jobfor the compute nodes in each time. This will not only ensure the short running time of thehigh priority jobs, but also ensure the shortening of all the system. Then the paperintroduced the idea of the algorithm and the specific implementation targets. Finally thepaper did experiment on my new algorithm to prove its correctness. The experimentshowed that compared with the Hadoop existing scheduling algorithms, the new scheduleralgorithm can effectively shorten the response time of the job.