Hadoop: How to find out the partition_Id in reduce step using Context object_问答_开发者

Hadoop: How to find out the partition_Id in reduce step using Context object

开发者 https://www.devze.com 2023-01-25 04:42 出处：网络

In Hadoop API ver. 0.20 and above the Context object was introduced instead JobConf. I need to find out using Context object:

相关专题：mapreduce

In Hadoop API ver. 0.20 and above the Context object was introduced instead JobConf.

I need to find out using Context object:

the partition_id for current Reducer
the output folder

Using obsoleted JobConf I can find the partition_id for current Reducer by this:

public void configure(JobConf conf){
  int  current_partition = conf.getInt("mapred.task.partition",-1);
}

I think that using Context object I need to do it inside the method

public void setup(Context c)

But how? And what about output fo开发者_如何学Pythonlder name?

If you want to get the partition, you can use context.getTaskAttemptID().getTaskID().getId(). The task id is created by the partition id. I list the related codes here, you can check the codes of ReduceTaskImpl, TaskImpl and MRBuilderUtils by yourself.


public TaskImpl(JobId jobId, TaskType taskType, int partition,
    EventHandler eventHandler, Path remoteJobConfFile, JobConf conf,
    TaskAttemptListener taskAttemptListener, OutputCommitter committer,
    Token jobToken,
    Credentials credentials, Clock clock,
    Map completedTasksFromPreviousRun, int startCount,
    MRAppMetrics metrics, AppContext appContext) {
  this.conf = conf;
  this.clock = clock;
  this.jobFile = remoteJobConfFile;
  ReadWriteLock readWriteLock = new ReentrantReadWriteLock();
  readLock = readWriteLock.readLock();
  writeLock = readWriteLock.writeLock();
  this.attempts = Collections.emptyMap();
  maxAttempts = getMaxAttempts();
  taskId = MRBuilderUtils.newTaskId(jobId, partition, taskType);
  this.partition = partition;
  ...
}

public static TaskId newTaskId(JobId jobId, int id, TaskType taskType) {
  TaskId taskId = Records.newRecord(TaskId.class);
  taskId.setJobId(jobId);
  taskId.setId(id);
  taskId.setTaskType(taskType);
  return taskId;
}

You can try just executing your Partitioner class over the first key of incoming data and you'll end up with a number of current reducers.

"Output folder" is not a property of a reducer. Strictly speaking, it's property of OutputFormat and nobody besides it knows clearly whether it's "output folder" at all - for example, it might be output to RDBMS, in some sort of SQL table. For simple HDFS-based outputs, it's property of whole map-reduce job, so it's usually accessible from JobContext, i.e.

c.getConfiguration().get("mapred.output.dir")

would most likely yield you an URL of your output directory.

Hadoop: How to find out the partition_Id in reduce step using Context object

精彩评论

关注公众号

热门标签

图文推荐

Hadoop: How to find out the partition_Id in reduce step using Context object

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：