开发者

MongoDB, return recent document for each user_id in collection

开发者 https://www.devze.com 2023-02-14 04:09 出处:网络
Looking for similar functionality to Postgres\' Dist开发者_JS百科inct On. Have a collection of documents {user_id, current_status, date}, where status is just text and date is a Date.Still in the ear

Looking for similar functionality to Postgres' Dist开发者_JS百科inct On.

Have a collection of documents {user_id, current_status, date}, where status is just text and date is a Date. Still in the early stages of wrapping my head around mongo and getting a feel for best way to do things.

Would mapreduce be the best solution here, map emits all, and reduce keeps a record of the latest one, or is there a built in solution without pulling out mr?


There is a distinct command, however I'm not sure that's what you need. Distinct is kind of a "query" command and with lots of users, you're probably going to want to roll up data not in real-time.

Map-Reduce is probably one way to go here.

Map Phase: Your key would simply be an ID. Your value would be something like the following {current_status:'blah',date:1234}.

Reduce Phase: Given an array of values, you would grab the most recent and return only it.

To make this work optimally you'll probably want to look at a new feature from 1.8.0. The "re-reduce" feature. Will allow you to process only new data instead of re-processing the whole status collection.

The other way to do this is to build a "most-recent" collection and tie the status insert to that collection. So when you insert a new status for the user, you update their "most-recent".

Depending on the importance of this feature, you could possibly do both things.


Current solution that seems to be working well.

map = function () {emit(this.user.id, this.created_at);}

//We call new date just in case somethings not being stored as a date and instead just a string, cause my date gathering/inserting function is kind of stupid atm

reduce = function(key, values) { return new Date(Math.max.apply(Math, values.map(function(x){return new Date(x)})))}


res = db.statuses.mapReduce(map,reduce);


Another way to achieve the same result would be to use the group command, which is a kind of a mr-shortcut that lets you aggregate on a specific key or set of keys. In your case it would read like this:

db.coll.group({ key : { user_id: true },
                reduce : function(obj, prev) { 
                           if (new Date(obj.date) < prev.date) { 
                             prev.status = obj.status; 
                             prev.date = obj.date; 
                           }
                         },
              initial : { status : "" }
}) 

However, unless you have a rather small fixed amount of users I strongly believe that a better solution would be, as previously suggested, to keep a separate collection containing only the latest status-message for each user.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号