There has been a lot happening in this area, and while obvious of the technical differences i wanted to hear more about how developers have gotten around the bad bits and taken advatange of the different approach.
A brief summary to avoid boring responses that state the obvious. - They are schemaless - Faster than SQL
I am particularly interested in:
- Cassandra
Interesting gotchas:
- They're all very different. Cassandra, MongoDB and CouchDB seem like basically the same thing, but they're not. Expect to have to dig into details of your implementation.
- You will need to learn how to map-reduce. (with the DB of your choice)
- You will have to restructure your data. The first thing you'll typically encounter is that true "children" data will now be stored with the parent. This will make a ton of programmatic sense, but you'll have to fight your old urges to 'normalize'.
- You will have to restructure your data a little more when you realize that your "core" data is not the data you thought it was.
- You will probably write a few more
for
loops than you're used to. Neither good nor bad, this is really a side-effect of realizing that some things you did withsum(field)
don't work quite the same b/c you now have the all of the data in your current process.
Things you'll find with MongoDB specifically: - You will spend a lot less time administrating. DB & "table" creation are now free & automatic. This is a little trippy at first. - You will probably have to extend a bunch of your server monitoring software. Things like Nagios or Scout don't have this tracking built-in (yet). - You look at indexes with a much more careful eye. It's easy to get careless with indexes in SQL. Useless indexes in MongoDB can quickly degrade performance.
The biggest gotcha: You will have to "think different".
We've been trapped in a relational world for years. We're used to thinking of data in relational terms. Most of us naturally normalize data as we're designing tables. We like to optimize for "all possible queries" very early on.
NoSQL is very different. You'll start denormalize early. You'll focus exclusively on your most frequently run queries. You'll quickly realize that most slow queries don't really need real-time answers. Those will become map-reduce jobs that update data.
精彩评论