We need to be able to quickly perform queries across the set of a user's friends and friends of friends. This would be relatively straightforward in a relational database, but I'm somewhat stuc开发者_开发知识库k on the best way to accomplish it in MongoDB. We store the user IDs of a user's friends in an array in the user document, so the obvious solution is to do this:
- Pull all friend user IDs from user doc
- Pull all friend arrays from user docs of those friends (using an $in query across all friend IDs), combine application-side into one set, then combine that with first-level friend user IDs to get set of all friends and friends of friends
- Use that set to perform the final query (using $in) across all friends and friends of friends
While straightforward, this seems like a huge amount of back and forth, as compared to what we could do with a join in a relational database. Is there a more efficient way to do this in MongoDB, or is this a problem best suited for a RDBMS?
I asked Eliot Horowitz this very same question recently at MongoDB SV conference. He said the way he would structure it is to store each users friends as embedded documents within each user. For example, the structure might look like this:
{
_id : ObjectId("4e77bb3b8a3e000000004f7a"),
username : "alex",
friends : ["283956723823626626aa", "226567377578888888as", "8738783888aas88a8a88" ]
}
then you can have an index on user.friends
http://www.mongodb.org/display/DOCS/Indexes#Indexes-IndexingArrayElements
"When a document's stored value for a index key field is an array, MongoDB indexes each element of the array. See the Multikeys page for more information."
so to find all of "alex"'s friends I can just do:
db.user.find( { 'friends' : '4e77bb3b8a3e000000004f7a'});
this seems like a huge amount of back and forth, as compared to what we could do with a join in a relational database
This is all very relative. Your basic assumption on fetching "friends of friends of friends" is correct, it's a few hops and a couple of in-memory "distincts".
However, from the raw perspective of "total work done", this isn't very different from what you would have do with SQL. Yes it's a relatively simple SQL query, but the server itself still has to do basically the same amount of work, give or take some some network traffic.
Is there a more efficient way to do this in MongoDB, or is this a problem best suited for a RDBMS?
Is there a better way in MongoDB? Probably not. But doing "self-joins" in SQL doesn't easily scale across multiple servers. In fact, trying to do this across multiple servers basically devolves into a similar process to the MongoDB process.
Technically, this is a job best done by a Graph Database which is neither MongoDB nor an RDBMS.
For Graph Databases you can take a look at Trinity for .NET or NEO4J.
I believe that this is something that is better handled by an RDBMS (barring graph DBs) since you clearly need to perform a "join" operation. Although an RDBMS might implement it the same way, it could implement the join more efficiently and distribute the information more efficiently than MongoDB.
With that said, the overhead of performing the "join" query atomically might prove too costly if you have a large cluster of db nodes and a massive amount of users.
If you're not worried about consistency and atomicity of the query, and all you want is to prevent the back-and-forth between the application and the DB, you can write a JavaScript function which will perform the entire query on MongoDB, or use a MapReduce operation for more efficient distributed queries.
精彩评论