NoSQL or Relational or Both_问答_开发者_运维开发者技术经验分享

I am working on a project where i have to save list of friends.After thinking a lot and searching on net best way to do it seems saving user id and friend id in a table. But its sure that if project expects to reach large scale this method does not seem very good. Most of the large scale companies like Google,Facebook,Twitter have also moved their functions on nosql databases. So does not it seem that maybe we should start our project from these NoSQL databases?

But at开发者_开发技巧 the same time I have read there is lots of coding work in NoSQL as many default services in relational databases are not provided here(Correct me if am wrong )

Maybe one way to do it can be start of with relational as it has very good functionality on small scale and later move on to NoSQL but for that you have to write very good portable code where ORM can play a good role or not?

Would like opinion of others on what can be right approach to do this?

Stay away from ORMs in general and ActiveRecord in particular.
They usually create a 'development debt', which means that it makes beginning of project look easy.
And when you have invested in full integration with ORM and 80% of project is finished ,
you begin to see all the border-cases where you ORM falls flat.

In addition to that, most of ORMs make suboptimal queries and never take advantage of engine-specific features.

As for SQL vs noSQL: i would recommend to start with an SQL database, then when application grows, start using some caching strategy (memcached, or maybe redis). And only when that solution is exhausted - start looking for parts of your database logic that does not require to be relational.

The noSQL databases serve a list of very specific use-cases and are not meant for you average application.

Use a SQL database.

When you start getting millions and millions of users a day, start using some sort of NoSQL database.

edit I see that others propose starting with SQL. I'd like to change my proposition and say - experiment with small scale project like "small twitter clone" or "store with video tapes". Keep database on many nodes and write scripts wich will flood you with data. Do it with Riak/Cassandra and then with some SQL solution. You'll find yourself what is easier and quicker. /edit

I would go with NoSQL (this is what I'm doing now. Previously I used MySQL in large scale projects). Why? It is much simpler to use so you can pay more attention to other important things (NoSQL takes care of most data storage problems):

You don't have to define schema which also means you don't have to upgrade it. In MySQL I had long downtimes due to system upgrade. Adding single column/index took a lot of time. Tables had only few millions of rows.
You get running, distributed environment in few minutes. In MySQL you have to manually split data between few machines (unless you keep everything on one which is not a good idea).
You get much better performance. With MySQL performance is really bad. It just does not work without memcached. Memcached is a distributed key-value store (simple NoSQL database). Obviously using memcached costs you additional time spent on optimizing queries
You don't have to think about normalization / denormalization
Queries are simple (at least in key-value stores). You just don't care about something like: should I use "where UserId = 12345" or "where UserId = '12345'" (in MySQL one of them will not use indexes!).
If one machine with NoSQL fails you don't care about that in your application. The query will be executed on another replica (you don't have to implement this!)

There are also downsides to using NoSQL

You don't get ACID. In most cases you just don't need that!
Also there is more developers familiar with SQL solutions. On the other hand NoSQL solutions are much simpler (at least in my experience) so you don't need certified database administrator (a magician who solves your db problems and only he knows why it works)
You can't do certain queries - for example joins are not there, but if you don't normalize the data then joins are useless (and you save time as you don't have to think about normalization).

Great article: http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/

My advice would be to start with NoSQL and stick with it. You should look at dynamo based databases like Riak and Cassandra. Also try CouchDB (CoachBase). This is for most of the data. For friends relationships graph database is good option.

I don't think ORM will help you much, the philosophy of a nosql databases is quite different form that of a relational database. So once you start with relational and put a lot of effort in the schema and use its goodies like foreign keys you will have to work to move it to nosql. It will be the same with the other way round.

Big companies that you've mentioned are using nosql because of high throughput it offers and simplicity of database schemas taking into account that it lacks some advanced features that relational databases offer.

For their accountings they are using relational databases, I'm sure of that :-)

At the end it will depend on the complexity of your schema: if it is simple enough try nosql: much easier to setup, actually you won't need to define schema at all, you just set you record (or document as some of them call it) and save it. No need to alter tables if you change your mind regarding table structures: just save you data. Easy: that's why it is so coming up today.

But no referential integrity, there are also some restrictions regarding transactions, not everything is supported. So if you need more in terms of database schema, data integrity, transactions, go for a relational database.

I use MongoDB, Riak and some other NoSQL solutions in production for some time now. I think the biggest upside to using a NoSQL solution from the start is the thought process where you are not constrained to the relational data model they teach at the university, this makes you more inclined to fit the data to your needs as opposed to adjusting your application to fit the representation of your data.

That said, I think that scaling prematurely is probably not a good thing, if you are building a new web application (or some big data application) it usually takes some time until you reach a limit on anything which requires NoSQL (bandwith, memory, performance...)

The best advice I can give you is to build your application with abstractions to the data model (not ORM) so for instance if you want to fetch a friend list from a user_id I would build an interface for the "friend storage" which will have the method of fetch_friends(user_id):friend_ids by keeping a good abstraction you can swap out the underlying implementation when the need arises.

I myself used this method for some similar purpose in storing user data. Started off with MongoDB (my data was schema-less) when the load became too much for a single server (before MongoDB had proper sharding) moved on to Riak and when requiring better SLAs for the clients (Riak isn't all that reliable) moved on to some propriety solution. Each move required large overheads in development and integration time, but by then we had the resources to make such a move.

IMHO, if I were you, I would start off with MySQL so that I don't need to think about durability or consistency or persistence and swap it out when I hit some bump.

SQL vs. NoSQL review: http://www.sigmod.org/publications/sigmod-record/1012/pdfs/04.surveys.cattell.pdf

"Scalable RDBMSs thus have an advantage over the NoSQL data stores, because you have the convenience of the higher-level SQL language and ACID properties, but you only pay a price for those when they span nodes."

playOrm allows you to store relational data in a noSQL store and still allow that data to scale as the system grows so have the best of both worlds.