So using the regular MongoDB library in Ruby I have the following query to开发者_Python百科 find average filesize across a set of 5001 documents:
avg = 0
total = collection.count()
Rails.logger.info "#{total} asset creation stats in the system"
collection.find().each {|row| avg += (row["filesize"] * (1/total.to_f)) if row["filesize"]}
Its pretty simple, so I'm trying to do the same using map/reduce as a learning exercise. This is what I came up with:
map = 'function(){emit("filesizes", {size: this.filesize, num: 1});}'
reduce = 'function(k, vals){
var result = {size: 0, num: 0};
for(var x in vals) {
var new_total = result.num + vals[x].num;
result.num = new_total
result.size = result.size + (vals[x].size * (vals[x].num / new_total));
}
return result;
}'
@results = collection.map_reduce(map, reduce)
However the two queries come back with two different results!
What am I doing wrong?
You're weighting the results by doing the division in every reduce function.
Say you had [{size : 5, num : 1}, {size : 5, num : 1}, {size : 5, num : 1}]
. Your reduce would calculate:
result.size = 0 + (5*(1/1)) = 5
result.size = 5 + (5*(1/2)) = 7.25
result.size = 7.25 + (5*(1/3)) = 8.9
As you can see, this weights the results towards the earliest elements.
Fortunately, there's a simple solution. Just add a finalize function, which will be run once after the reduce step is finished.
精彩评论