I'm using org.apache.hadoop.mapreduce.Job
to create/submit/run a MR Job (Cloudera3, 20.2), and after it completes, in a separate application, I'm trying to get the Job to grab the counters to do some work with them so I don't have to re-run the entire MR开发者_JS百科 Job every time to test my code that does work.
I can get a RunningJob
from a JobClient, but not a org.apache.hadoop.mapreduce.Job
. RunningJob gives me Counters from the mapred package, while Job gives me counters from the mapreduce package. I tried using new Job(conf, "job_id")
, but that just creates a blank Job in status DEFINE
, not FINISHED
.
Here is a how I do it :
package org.apache.hadoop.mapred;
import java.io.IOException;
import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;
public class FinishedJobHelper {
public static Counters getCounters(String jobTrackerHost, int jobTrackerPort, String jobIdentifier, int jobId) throws IOException {
InetSocketAddress link = new InetSocketAddress(jobTrackerHost, jobTrackerPort);
JobSubmissionProtocol client = (JobSubmissionProtocol) RPC.getProxy(JobSubmissionProtocol.class, JobSubmissionProtocol.versionID, link, new Configuration());
return client.getJobCounters(new JobID(jobIdentifier, jobId));
}
}
The package should be org.apache.hadoop.mapred
(don't change it) since JobSubmissionProtocol
is protected interface. The problem with this method is you can't retrieve jobs that are "retired". So I prefer not relaying on this and push the counters as soon as the job completes.
...
job.waitForCompletion(true);
//get counters after job completes and push them elsewhere
Counters counters = job.getCounters();
...
Hope this would help.
精彩评论