开发者

Parititioned Data Map/Reduce

开发者 https://www.devze.com 2023-01-05 20:12 出处:网络
I have written my custom partitioner for partitioning datasets. I want to partition two datasets using the same partitioner and then in the next mapreduce job, I want each mapper to handle the same pa

I have written my custom partitioner for partitioning datasets. I want to partition two datasets using the same partitioner and then in the next mapreduce job, I want each mapper to handle the same partition fr开发者_如何学Goom the two sources and perform some function such as joining etc. How I can I ensure that one mapper gets the split that corresponds to same partition from both the sources?

Any help would be highly appreciated.


What you are describing is one variation of a map-side join. Chapter 8 of Pro Hadoop or org.apache.hadoop.mapred.join

0

精彩评论

暂无评论...
验证码 换一张
取 消