开发者

What is the maximum number of records that a hadoop reducer's reduce() call can take?

开发者 https://www.devze.com 2023-02-16 16:46 出处:网络
I have a mapper whose output i开发者_如何学运维s mapped to multiple different reducer instances by using my own Partitioner. My partitioner makes sure that a given is sent always to a given reducer in

I have a mapper whose output i开发者_如何学运维s mapped to multiple different reducer instances by using my own Partitioner. My partitioner makes sure that a given is sent always to a given reducer instance. What I am wondering about is if for some reason, input data is skewed and i get, say, a million records (more precisely, #records can not fit into memory) for a particular key, is there any possible way in which reducer will still work fine? I mean, is the hadoop iterable that is passed to reducer a lazy loader?


The only practical limit to the values associated with a Reducer is free space on the local disks, both Map and Reduce side. This can be managed by adding more nodes and thus more Map/Reduce tasks, depending on your skew.

So yes, the Iterator loads the values from a combination of memory and disk.

0

精彩评论

暂无评论...
验证码 换一张
取 消