开发者

Hadoop: How to find out the partition_Id in reduce step using Context object

开发者 https://www.devze.com 2023-01-25 04:42 出处:网络
In Hadoop API ver. 0.20 and above the Context object was introduced instead JobConf. I need to find out using Context object:

In Hadoop API ver. 0.20 and above the Context object was introduced instead JobConf.

I need to find out using Context object:

  1. the partition_id for current Reducer

  2. the output folder

Using obsoleted JobConf I can find the partition_id for current Reducer by this:

public void configure(JobConf conf){
  int  current_partition = conf.getInt("mapred.task.partition",-1);
}

I think that using Context object I need to do it inside the method

public void setup(Context c)

But how? And what about output fo开发者_如何学Pythonlder name?


If you want to get the partition, you can use context.getTaskAttemptID().getTaskID().getId(). The task id is created by the partition id. I list the related codes here, you can check the codes of ReduceTaskImpl, TaskImpl and MRBuilderUtils by yourself.


public TaskImpl(JobId jobId, TaskType taskType, int partition,
    EventHandler eventHandler, Path remoteJobConfFile, JobConf conf,
    TaskAttemptListener taskAttemptListener, OutputCommitter committer,
    Token jobToken,
    Credentials credentials, Clock clock,
    Map completedTasksFromPreviousRun, int startCount,
    MRAppMetrics metrics, AppContext appContext) {
  this.conf = conf;
  this.clock = clock;
  this.jobFile = remoteJobConfFile;
  ReadWriteLock readWriteLock = new ReentrantReadWriteLock();
  readLock = readWriteLock.readLock();
  writeLock = readWriteLock.writeLock();
  this.attempts = Collections.emptyMap();
  maxAttempts = getMaxAttempts();
  taskId = MRBuilderUtils.newTaskId(jobId, partition, taskType);
  this.partition = partition;
  ...
}

public static TaskId newTaskId(JobId jobId, int id, TaskType taskType) {
  TaskId taskId = Records.newRecord(TaskId.class);
  taskId.setJobId(jobId);
  taskId.setId(id);
  taskId.setTaskType(taskType);
  return taskId;
}


You can try just executing your Partitioner class over the first key of incoming data and you'll end up with a number of current reducers.

"Output folder" is not a property of a reducer. Strictly speaking, it's property of OutputFormat and nobody besides it knows clearly whether it's "output folder" at all - for example, it might be output to RDBMS, in some sort of SQL table. For simple HDFS-based outputs, it's property of whole map-reduce job, so it's usually accessible from JobContext, i.e.

c.getConfiguration().get("mapred.output.dir")

would most likely yield you an URL of your output directory.

0

精彩评论

暂无评论...
验证码 换一张
取 消