I have a mapper whose output i开发者_如何学运维s mapped to multiple different reducer instances by using my own Partitioner. My partitioner makes sure that a given is sent always to a given reducer instance. What I am wondering about is if for some reason, input data is skewed and i get, say, a million records (more precisely, #records can not fit into memory) for a particular key, is there any possible way in which reducer will still work fine? I mean, is the hadoop iterable that is passed to reducer a lazy loader?
The only practical limit to the values associated with a Reducer is free space on the local disks, both Map and Reduce side. This can be managed by adding more nodes and thus more Map/Reduce tasks, depending on your skew.
So yes, the Iterator loads the values from a combination of memory and disk.
精彩评论