In Hadoop API ver. 0.20 and above the Context object was introduced instead JobConf.
I need to find out using Context object:
the partition_id for current Reducer
the output folder
Using obsoleted JobConf I can find the partition_id for current Reducer by this:
public void configure(JobConf conf){
int current_partition = conf.getInt("mapred.task.partition",-1);
}
I think that using Context object I need to do it inside the method
public void setup(Context c)
But how? And what about output fo开发者_如何学Pythonlder name?
If you want to get the partition, you can use context.getTaskAttemptID().getTaskID().getId(). The task id is created by the partition id. I list the related codes here, you can check the codes of ReduceTaskImpl, TaskImpl and MRBuilderUtils by yourself.
public TaskImpl(JobId jobId, TaskType taskType, int partition,
EventHandler eventHandler, Path remoteJobConfFile, JobConf conf,
TaskAttemptListener taskAttemptListener, OutputCommitter committer,
Token jobToken,
Credentials credentials, Clock clock,
Map completedTasksFromPreviousRun, int startCount,
MRAppMetrics metrics, AppContext appContext) {
this.conf = conf;
this.clock = clock;
this.jobFile = remoteJobConfFile;
ReadWriteLock readWriteLock = new ReentrantReadWriteLock();
readLock = readWriteLock.readLock();
writeLock = readWriteLock.writeLock();
this.attempts = Collections.emptyMap();
maxAttempts = getMaxAttempts();
taskId = MRBuilderUtils.newTaskId(jobId, partition, taskType);
this.partition = partition;
...
}
public static TaskId newTaskId(JobId jobId, int id, TaskType taskType) {
TaskId taskId = Records.newRecord(TaskId.class);
taskId.setJobId(jobId);
taskId.setId(id);
taskId.setTaskType(taskType);
return taskId;
}
You can try just executing your Partitioner class over the first key of incoming data and you'll end up with a number of current reducers.
"Output folder" is not a property of a reducer. Strictly speaking, it's property of OutputFormat and nobody besides it knows clearly whether it's "output folder" at all - for example, it might be output to RDBMS, in some sort of SQL table. For simple HDFS-based outputs, it's property of whole map-reduce job, so it's usually accessible from JobContext, i.e.
c.getConfiguration().get("mapred.output.dir")
would most likely yield you an URL of your output directory.
精彩评论