开发者

How to get names of the currently running hadoop jobs?

开发者 https://www.devze.com 2023-03-02 18:36 出处:网络
I need to get the list of job names that currently running, but hadoop -job list give me a list of jobIDs.

I need to get the list of job names that currently running, but hadoop -job list give me a list of jobIDs.


I've had to do this a number of times so I came up with the following command line that you can throw in a script somewhere and reuse. It prints the jobid followed by the job name.

hadoop job -list | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "hadoop job -status {} | egrep '^tracking' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "echo -n {} | sed 's/.*jobid=//'; echo -n ' ';curl -s -XGET {} | grep 'Job Name' | sed 's/.* //' | sed 's/<br>//'"


If you use Hadoop YARN don't use mapred job -list (or its deprecated version hadoop job -list) just do

yarn application -appStates RUNNING -list

That also prints out the application/job name. For mapreduce applications you can get the corresponding JobId by replacing the application prefix of the Application-Id with job.


Modifying AnthonyF's script, you can use the following on Yarn:

mapred job -list 2> /dev/null | egrep '^\sjob' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} 2>/dev/null | egrep 'Job File' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "hadoop fs -cat {} 2>/dev/null | egrep 'mapreduce.job.name' | sed 's/.*<value>//' | sed 's/<\/value>.*//'"


If you do $HADOOP_HOME/bin/hadoop -job -status <jobid> you will get a tracking URL in the output. Going to that URL will give you the tracking page, which has the name

Job Name: <job name here>

The -status command also gives a file, which can also be seen from the tracking URL. In this file is a mapred.job.name which has the job name.

I didn't find a way to access the job name from the command line. Not to say there isn't... but not found by me. :)

The tracking URL and xml file are probably your best options for getting the job name.


You can find the information in JobTracker UI

You can see

Jobid
Priority    
User
Name of the job
State of the job whether it succeed or failed
Start Time  
Finish Time 
Map % Complete  
Reduce % Complete etc 

INFO


Just In case any one interested in latest query to get the Job Name :-). Modified Pirooz Command -

mapred job -list 2> /dev/null | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} 2>/dev/null | egrep 'Job File'" | awk '{print $3}' | xargs -n 1 -I{} sh -c "hadoop fs -cat {} 2>/dev/null" | egrep 'mapreduce.job.name' | awk -F"" '{print $2}' | awk -F "" '{print $1}'


I needed to look through history, so I changed mapred job -list to mapred job -list all....

I ended up adding a -L to the curl command, so the block there was:

curl -s -L -XGET {}

This allows for redirection, such as if the job is retired and in the job history. I also found that it's JobName in the history HTML, so I changed the grep:

grep 'Job.*Name' 

Plus of course changing hadoop to mapred. Here's the full command:

mapred job -list all | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} | egrep '^tracking' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "echo -n {} | sed 's/.*jobid=//'; echo -n ' ';curl -s -L -XGET {} | grep 'Job.*Name' | sed 's/.* //' | sed 's/<br>//'"

(I also changed around the first grep so that I was only looking at a certain username....YMMV)


by typing "jps" in your terminal .

0

精彩评论

暂无评论...
验证码 换一张
取 消