DB structure for an autodeploy multi-application_问答_开发者

I want to create an application similar to basecamp or mailchimp. The customer registers him self and then sets up the application for themself automatically. The application will be developed using cakephp.

My question is what is the best DB structure?

All customer separated by customer id in one table.
Every cus开发者_JAVA百科tomer with own DB + DB User.
Use for every one an SQLite file in his folder.

There can be different approaches to the implementation and each depends on the nature of your application, like what functionality is provided to each user, what per-user data is involved and relationships those data hold, how much per-user data is involved etc.

Approach 1: single application database; multiple tables as per application's functionality/structure but the tables hold data for all users. For example, comments, permissions, categories etc.

pros: simple architecture, easy and quick retrievals and inserts

cons: the database operations might get expensive if the tables grow too large in size or involve complex indexes

Approach 2: single application database; multiple tables as per applications's functionality/structure; each user has its own tables set identified by perhaps the user_id. For example, for user_id = 1, the tables might be comments_1, permissions_1, categories_1 etc.

pros: again simple architecture; easy to identify which tables to query for a particular user; since tables will contain data only for a specific user, there be at least one less WHERE clause (where user_id = xx); smaller tables and therefore quicker retrievals; fewer chances for locks conflicts during busy hours

cons: requires more maintenance; adding newer functionality that requires a new column or a table to be added, will need schema changes to all the users table-set;

Approach 3: multiple application databases per user

pros: 100% isolation of data between users; easy to tweak the DB schema should customized functionality be required per user; easy to split databases across multiple servers for load balancing purposes;

cons: complex architecture; requires more maintenance; trickier to store common or shared data - the data might either be replicated to every user database OR a common database can be maintained.

I think if the schema is efficiently designed such that a balance is maintained between quicker SELECTs/INSERTs and amount of data per table, the first approach should work nicely for 100-10000 users. However, it will need much database tuning and smart indexes.

From approach 2 and 3, both work fine but from my perspective, approach 3 is better as it gives you more flexibility. The implementation might need some time but it is not difficult to

Also, SQLite doesn't seem to be appropriate for an implementation like this. I will suggest a relational database like MySQL.

Hope the above provides some insight into the implementation and helps you some in deciding what works best for your application.

If you're going to get big (scaleable) then SQLite is probably not your best bet. A true RDBMS is far more efficient. That being said, if you're truely going to scale Cake may not be the most efficient option either. Those are decisions for you to make based on your business model. It's good to have aspirations, but it's rare to become a 10,000 pound gorilla...pun intended.

My company has an application that does marketing automation for dozens of clients that uses a common DB for common functions and a separate DB for unique data. Yes, it works, and it's actually pretty efficient and does a good job separating data so the DB doesn't get out of hand....in fact, the shared db has tables with millions of records. That being said, keeping track of your connection STINKS and is more often than not the cause of our errors. Drop just one session or instantiate something wrong and BOOM! It's toast. I often find myself having to fully qualify my queries to make things work, which just adds to the stress. I don't think I'd do it this way again.

Also, from a sheer volume standpoint, having to find a database amongst thousands wouldn't be my idea of a good afternoon either. I dislike having to jump through 50 to find the data I need for troubleshooting.

With a single DB, one connection just works. From a Dev standpoint, it's much easier. It's hard for me to say performance-wise what the benefits are because our app suffers most from a terribly inefficient framework (legacy Symfony)

We are creating a similar structure application where people can sign up and create there own internal application. We are using MySQL and all the data is stored in the same database. We have structured the tables in such a way that with the login credentials all the data can be easily identified across the site and fetched as and when required.

I would recommend you take a look into some new innovative types of databases. For huge data sets normal SQL DBs start to fall short as the amount of data gets above a certain point. This is why Google created their BigTable project (http://en.wikipedia.org/wiki/BigTable). It is also what is behind the NoSQL movement (http://en.wikipedia.org/wiki/NoSQL).

What I recommend specifically is using MongoDB (http://en.wikipedia.org/wiki/MongoDB). It is a NoSQL database that stores the information in an object-oriented fashion in collections of JSON-like documents. Its a bit to wrap your head around at first, but it works and it is insanely fast. I have a buddy that launched a brand new anime website using MongoDB and the Zend Framework and his website is just as fast as anything Google has to offer if not faster and he runs on one dedicated server.

You can find MongoDB at http://www.mongodb.org/
Here is a guide for you on using it with CakePHP: http://mark-story.com/posts/view/using-mongodb-with-cakephp
The MongoDB website also has more information on this: http://www.mongodb.org/display/DOCS/PHP+Libraries,+Frameworks,+and+Tools

I strongly recommend you to use a NoSQL design. NonSQL means Scalable-Non-Relational data store without joins and with a lightweight semantic. A NonSQL approach will improve the way you develop applications by earning new models and point of views about the data.

NoSQL DBs tend to use memory over disk as the first-class write location: Redis and Memcached are in-memory only, and even systems like Cassandra use memtables for writes with asynchronous flushing to disk, preventing inconsistent I/O performance from creating write speed bottlenecks. And since NoSQL datastores typically emphasize horizontal scalability via partitioning, this puts them in an excellent position to take advantage of the elastic provisioning capability of cloud. NoSQL and cloud are a natural fit.

What options do you have?

NoSQL can give you better performance for certain scenarios:

-Frequently-written, rarely read data like web hit counters, or data from logging devices: Redis | MongoDB

-Frequently-read, rarely written/updated: Memcached for transient data caching, Cassandra | HBase for searching, and Hadoop and Hive for data analysis

-High-availability applications which demand minimal downtime do well with clustered, redundant data stores: Riak | Cassandra

-Data synchronization across multiple locations: CouchDB

-Transient data (web sessions & caches) do well in transient key-value data stores: Memcached

-Big data arising from business or web analytics that may not follow any apparent schema: Hadoop

A combination?

Perhaps your application fits better with a wise combination of different data stores. So check this topics and choose.