开发者

Map/Reduce differences between Couchbase & CloudAnt

开发者 https://www.devze.com 2023-03-06 17:30 出处:网络
I\'ve been playing around with Couchbase Server and now just tried replicating my local db to Cloudant, but am getting conflicting results for my map/reduce function pair to build a set of unique tags

I've been playing around with Couchbase Server and now just tried replicating my local db to Cloudant, but am getting conflicting results for my map/reduce function pair to build a set of unique tags with their associated projects...

// map.js
function(doc) {
  if (doc.tags) {
    for(var t in doc.tags) {
      emit(doc.tags[t], doc._id);
    }
  }
}

// reduce.js
function(key,values,rereduce) {
  if (!rereduce) {
    var res=[];
    for(var v in values) {
      res.push(values[v]);
    }
    return res;
  } else {
    return values.length;
  }
}

In Cloudbase server this returns JSON like:

{"rows":[
{"key":"3d","value":["project1","project3","project8","project10"]},
{"key":"agents","value":["project2"]},
{"key":"fabrication","value":["project3","project5"]}
]}

That's exactly what I wanted & expected. However, the same query on the Cloudant replica, returns this:

{"rows":[
{"key":"3d","value":4},
{"key":"agents","value":1}开发者_运维技巧,
{"key":"fabrication","value":2}
]}

So it somehow only returns the length of the value array... Highly confusing & am grateful for any insights by some M&R ninjas... ;)


It looks like this is exactly the behavior you would expect given your reduce function. The key part is this:

else {
return values.length;
}

In Cloudant, rereduce is always called (since the reduce needs to span over multiple shards.) In this case, rereduce calls values.length, which will only return the length of the array.


I prefer to reduce/re-reduce implicitly rather than depending on the rereduce parameter.

function(doc) { // map
  if (doc.tags) {
    for(var t in doc.tags) {
      emit(doc.tags[t], {id:doc._id, tag:doc.tags[t]});
    }
  }
}

Then reduce checks whether it is accumulating document ids from the identical tag, or whether it is just counting different tags.

function(keys, vals, rereduce) {
  var initial_tag = vals[0].tag;

  return vals.reduce(function(state, val) {
    if(initial_tag && val.tag === initial_tag) {
      // Accumulate ids which produced this tag.
      var ids = state.ids;
      if(!ids)
        ids = [ state.id ]; // Build initial list from the state's id.
      return { tag: val.tag, 
             , ids: ids.concat([val.id])
             };
    } else {
      var state_count = state.ids ? state.ids.length : state;
      var val_count   = val.ids   ? val.ids.length   : val;
      return state_count + val_count;
    } 
  })
}

(I didn't test this code, but you get the idea. As long as the tag value is the same, it doesn't matter whether it's a reduce or rereduce. Once different tags start reducing together, it detects that because the tag value will change. So at that point just start accumulating.

I have used this trick before, although IMO it's rarely worth it.

Also in your specific case, this is a dangerous reduce function. You are building a wide list to see all the docs that have a tag. CouchDB likes tall lists, not fat lists. If you want to see all the docs that have a tag, you could map them.

for(var a = 0; a < doc.tags.length; a++) {
  emit(doc.tags[a], doc._id);
}

Now you can query /db/_design/app/_view/docs_by_tag?key="3d" and you should get

{"total_rows":287,"offset":30,"rows":[
{"id":"project1","key":"3d","value":"project1"}
{"id":"project3","key":"3d","value":"project3"}
{"id":"project8","key":"3d","value":"project8"}
{"id":"project10","key":"3d","value":"project10"}
]}
0

精彩评论

暂无评论...
验证码 换一张
取 消