开发者

mongodb map reduce: "first/lowest" value?

开发者 https://www.devze.com 2023-04-05 14:53 出处:网络
I have documents like this: { \"_id\" : \"someid\", \"name\" : \"somename\", \"action\" : \"do something\",

I have documents like this:

{
        "_id" : "someid",
        "name" : "somename",
        "action" : "do something",
        "date" : ISODate("2011-08-19T09:00:00Z")
}

I want to map reduce them into something like this:

{
        "_id" : "someid",
        "value" : {
            "count" : 100,
            "name" : "somename",
            "action" : "do something",
            "date" : ISODate("2011-08-19T09:00:00Z")
            "firstEncounteredDate" : ISODate("2011-07-01T08:00:00Z")
        }
}

I want to group the map reduced documents by "name", "action", and "date". But every document should has this "fir开发者_运维问答stEncounteredDate" containing the earliest "date" (that is actually grouped by "name" and "action").

If I group by name, action and date, firstEncounteredDate would always be date, that's why I'd like to know if there's any way to get "the earliest date" (grouped by "name", and "action" from the entire document) while doing map-reduce.

How can I do this in map reduce?

Edit: more detail on firstEncounteredDate (courtesy to @beny23)


Seems like a two-pass map-reduce would fit the bill, somewhat akin to this example: http://cookbook.mongodb.org/patterns/unique_items_map_reduce/

In pass #1, group the original "name"x"action"x"date" documents by just "name" and "action", collecting the various "date" values into a "dates" array during reduce. Use a 'finalize' function to find the minimum of the collected dates.

Untested code:

// phase i map function : 

function () {
  emit( { "name": this.name, "action": this.action } , 
        { "count": 1, "dates": [ this.date ] } );
}

// phase i reduce function : 

function( key, values ) {
  var result = { count: 0, dates: [ ] };

  values.forEach( function( value ) {
    result.count += value.count;
    result.dates = result.dates.concat( value.dates );
  }

  return result;
}

// phase i finalize function : 

function( key, reduced_value ) {
  var earliest = new Date( Math.min.apply( Math, reduced_value.dates ) );
  reduced_value.firstEncounteredDate = earliest ;
  return reduced_value;
}

In pass #2, use the documents generated in pass #1 as input. For each "name"x"action" document, emit a new "name"x"action"x"date" document for each collected date, along with the now determined minimum date common to that "name"x"action" pair. Group by "name"x"action"x"date", summing up the count for each individual date during reduce.

Equally untested code:

// phase ii map function : 

function() {
  this.dates.forEach( function( d ) {
    emit( { "name": this.name, "action": this.action, "date" : d } ,
          { "count": 1, "firstEncounteredDate" : this.firstEncounteredDate } );
  }
}

// phase ii reduce function : 

function( key, values ) {
  // note: value[i].firstEncounteredDate should all be identical, so ... 
  var result = { "count": 0, 
                 "firstEncounteredDate": values[0].firstEncounteredDate };

  values.forEach( function( value ) {
    result.count += value.count;
  }

  return result;
}

Pass #2 does not do a lot of heavy lifting, obviously -- it's mostly copying each document N times, one for each unique date. We could easily build a map of unique dates to their incidence counts during the reduce step of pass #1. (In fact, if we don't do this, there's no real point in having a "count" field in the values from pass #1.) But doing the second pass is a fairly effortless way of generating a full target collection containing the desired documents.

0

精彩评论

暂无评论...
验证码 换一张
取 消