开发者

Map/Reduce on an array of hashes in CouchDB

开发者 https://www.devze.com 2022-12-22 21:42 出处:网络
I am looking for a map/reduce function to calculate the status in a Design Document. Below you can see an example document from my curren开发者_如何学编程t database.

I am looking for a map/reduce function to calculate the status in a Design Document. Below you can see an example document from my curren开发者_如何学编程t database.

{
   "_id": "0238f1414f2f95a47266ca43709a6591",
   "_rev": "22-24a741981b4de71f33cc70c7e5744442",
   "status": "retrieved image urls",
   "term": "Lucas Winter",
   "urls": [
       {
           "status": "retrieved",
            "url": "http://...."
       },
       {
           "status": "retrieved",
            "url": "http://..."
       }
   ],
   "search_depth": 1,
   "possible_labels": {
       "gender": "male"
    },
    "couchrest-type": "SearchTerm"
}

I'd like to get rid of the status key and rather calculate it from the statuses of the urls. My current by_status view looks like the following:

function(doc) {
    if (doc['status']) {
       emit(doc['status'], null);
    }
}

I tried some things but nothing actually works. Right now my Map Function looks like this:

function(doc) {
    if(doc.urls){
        emit(doc._id, doc.urls)
    }
}

And my Reduce Function

function(key, value, rereduce){ 
    var reduced_status = "retrieved"
    for(var url in value){
        if(url.status=="new"){
            reduced_status = "new";
        }
    }
    return reduced_status;
}

The result is that I get retrieved everywhere which is definitely not right.

I tried to narrow down the problem and it seems to be that value is no array, when I use the following Reduce Function I get length 1 everywhere, which is impossible because I have 12 documents in my database, each containing between 20 to 200 urls

function(key, value, rereduce){ 
   return value.length;
}

alt text http://img.skitch.com/20100316-qeawxgd5pru8d5i6bprygcsmhf.jpg

What am I doing wrong? (I know I want you to write code for me and I'm feeling guilty, but right now I do the calculation of the statuses in ruby after getting the data from the database. It would be nice to already get the right data from the database)


The variable value of the reduce function is an array of values as emitted by the map function. In your case, value is an array consisting of "url"-arrays. When running map-reduce in futon, it sets group=true so that the map-reduce is run seperately for every key emitted from the map function. In your case, these keys are the document _ids. That is, the reduce function's value is an array whose elements are all url-arrays belonging to a certain doc _id. Since doc _ids are unique, you end up with the reduce function's value being an array with one element, this element is the url-array of the respective doc. That's why value.length is always 1 with your reduce function.

But it can get worse: If you end up in a rereduce-cycle, the reduce function's value is an array of values as returned by a previous call to the reduce function. In your case, you would call the reduce function with value looking like ["retrieved","new","retrieved"], which does not lead to proper results.

Usually, a reduce function is used to aggregate the data emitted by the map-function, for example to count rows or to sum up values - which is not necessary in your case. You can read more about map-reduce in couchdb here:

http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

http://books.couchdb.org/relax/design-documents/views


doc.urls seems to be an Array of Objects containing a status property and an url property. So your Reduce function should be something like

function(key, value, rereduce){ 
    var reduced_status = "retrieved";
    for(var i=0; i<value.length; i++) {
        if(value[i].status=="new"){
            reduced_status = "new";
        }
    }
    return reduced_status;
}

edit: actually the function should return as soon as it finds status == "new".


Thanks Alsciende for pushing me towards the right solution, turns out I really did not understand the reduce function. I didn't need a reduce function at all.

Here is my Map Function which solves it for me.

function(doc) {
if(doc.urls){
  var reduced_status = "retrieved";
  for(var i=0; i<doc.urls.length; i++) {
    if(doc.urls[i].status=="new"){
        reduced_status = "new";
        break;
    }
  }
  emit(reduced_status, null);
  }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消