I am new to hadoop.
I have set up a 2 node cluster.
开发者_如何学CHow to run 2 jobs parallely in hadoop.
When i submit jobs, they are running one by one in FIFO order. I have to run the jobs parallely. How to acheive that.
Thanks MRK
Hadoop can be configured with a number of schedulers and the default is the FIFO scheduler.
FIFO Schedule behaves like this.
Scenario 1: If the cluster has 10 Map Task capacity and job1 needs 15 Map Task, then running job1 takes the complete cluster. As job1 makes progress and there are free slots available which are not used by job1 then job2 runs on the cluster.
Scenario 2: If the cluster has 10 Map Task capacity and job1 needs 6 Map Task, then job1 takes 6 slots and job2 takes 4 slots. job1 and job2 run in parallel.
To run jobs in parallel from the start, you can either configure a Fair Scheduler or a Capacity Scheduler based on your requirements. The mapreduce.jobtracker.taskscheduler and the specific scheduler parameters have to be set for this to take effect in the mapred-site.xml.
Edit: Updated the answer based on the comment from MRK.
You have "Map Task Capacity" and "Reduce Task Capacity". Whenever those are free they would pick the job in FIFO order. Your submitted jobs contains mapper and optionally reducer. If your jobs mapper (and/or reducer) count is smaller then the cluster's capacity it would take the next jobs mapper (and/or reducer).
If you don't like FIFO, you can always give priority to your submitted jobs.
Edit:
Sorry about slight missinformation, Praveen's answer is the right one. in adition to his answer you can check HOD scheduler aswell.
With the default scheduler only one job per user at a time. You can launch different jobs from different user ids. They will run in parallel, of course, as mentioned by others you need to have enough slot capacity.
精彩评论