MongoDb vs CouchDb: write speeds for geographically remote clients_问答_开发者

I would like all of my users to be able to read and write to the datastore very quickly. It seems like MongoDb has blazing reads, but the writes seem like they could be very very slow if the one master db needs to be located very far away from the client. Couchdb seems that it has slow reads, but how开发者_运维问答 about the writes in the case when the client is very far away from the master. With couchdb, we can have multiple masters, meaning we can always have a write node close to the client. Could couchdb actually be faster for writes than mongodb in the case when our user base is spread very far out geographically?

I would love to use mongoDb due to its blazing fast speed, but some of my users very far away from the only master will have a horrible experience. For worldwide types of systems, wouldn't couchDb be better. Isn't mongodb completely ruled out in the case where you have users all around the world? MongoDb, if you're listening, why don't you do some simple multi-master setups, where conflict resolution can be part of the update semantic? This seems to be the only thing standing in between mongoDb completely dominating the nosql marketshare. Everything else is very impressive.

Disclosure: I am a MongoDB fan and user, i have zero experience with CouchDB.

I have a heavy duty app that is very read write intensive. I'd say reads outnumber writes by a factor of around 30:1. The way mongo is designed reads are always going to be much faster than writes the trick (in my experience) is to make your writes so efficient that you can dedicate a higher percentage of your system resources to the writes.

When building a product on top of mongo the key thing to remember is the _id field. This field is automatically generated and added to all of your JSON objects it will look something like 47cc67093475061e3d95369d when you design your queries (Find's) try and query on this field wherever possible as it contains the machine location (and i think also disk location??? - i should check this) where the object lives so when you use a find or update using this field will really speed up your machine. Consider this in the design of your system.

Example:

2 of the clusters in my database are "users" and "posts". A user can create multiple posts. These two collections have to reference each other alot in the implementation of my app.

In each post object i store the _id of the parent user. In each user object i store an array of all the posts the user has authored.

Now on each user page I can generate a list of all the authored posts without a resource stressful query but with a direct look up of the _id. The bigger the mongo cluster the bigger the difference this is going to make.

If you're at all familiar with oracle's physical location rowids you may understand this concept only in mongo it is much more awesome and powerful.

I was scared last year when we decided to finally ditch MySQL for mongo but I can tell you the following about my experience: - Data porting is always horrible but it went as well as I could have imagined. - Mongo is probably the best documented NoSQL DB out there and the Open Source community is fantastic. - When they say fast and scalable there not kidding, it flies. - Schema design is very easy and much more natural and orderly than key/value type db's in my opinion. - The whole system seems designed for minimal user complexity, adding nodes etc is a breeze.

Ok, seriously I swear mongo didn't pay me to write this (I wish) but apologies for the love fest.

Whatever your choice, best of luck.

Here is a detailed article that 10gen has created, and gives examples of when you should choose MongoDB or CouchDB, with reasons as well.

http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB

Edit

The above link was removed, but can be viewed here: http://web.archive.org/web/20120614072025/http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB

Your question as of now, is full with speculation and guessing.

...why can't we opt out of consistency for certain writes, so long as we're sure that the person that wrote the data will be able to read it consistently, whereas others will observe eventual consistency

What if those writes effect other writes? What if those writes would prevent other people from doing stuff. It's hard to tell the possible side effect if since you didn't tell us any specifics.

My main suggestion to you is that you do some testing. Unless you've tested it, speculation about bottle necks is a complete waste of time. You don't need to test it via remote machines, set up some local DBs and add some artificial lag, then run your tests.

This way you can test the different options you've got, see where MongoDB is better, or where CouchDB excels at. Then you can either take one of them and go with the contras, or you can try and tweak your Database Model it self and do more tests.

Nobody here will be able to give you a general solution to your specific problem (well unless you give us all your code and you pay us for working on it :P ) databases aren't easy especially if you need to scale them under certain requirements.

For worldwide types of systems, wouldn't couchDb be better. Isn't mongodb completely ruled out in the case where you have users all around the world?

MongoDB supports sharding. So you don't need a single master. In fact, it looks like you have a ready shard key (region).

MongoDB also supports replica sets along with sharding. So if you need to run in multiple data centers (DCs) you put a master and one of the replicas in the same DC. In fact, they also suggest adding a 3rd node to a separate DC as a hot backup failover.

You will have to drill into the more detailed configuration of MongoDB, but you can definitely control where data is stored and you can prioritize that other replicas in a DC are "next in line" for promotion to Master.

At this point however, you're well into the details of MongoDB and you'll need to dig around and "play" quite a bit. However, you'll need lots of "play time" for any solution that's really going to handle masters across data centers.