I have been given the task to design and develop a web application for a NGO (Non Govt Org) which runs primary schools in many towns and villages. The application will keep a record of all the schools, students, volunteers and the teachers of every school. Currently there are about 30 schools in the NGO's umbrella but they have very ambitious plan to increase the number very rapidly. We will host the app on Windows Azure using SQL Azure as the database. Now I am facing a tough task on how to design its database with minimum expenditure (as the NGO is funded completely by charities and donations). As you might know that the databases in SQL Azure is offered in specific sizes like 5,10,20 to 50 GB, it puts a restriction on the maximum size of each database. I have came out with following approaches:
1) For every school create a separate database of 5 or 10 GB size. Each database will have tables like 'student','subject','attendance' etc. The problem in this approach is that a lot of databases will have to be created. One for every school. This would drastically shoot up the cost. Also initially a large portion of 10 GB size will be under-utilized, but in future it may happen that 10GB would seem less for storing a school's data.
2) Keep a single database with tables like 'school','student','attendance' etc. This would keep the cost low initially but during course of time the database would start fill开发者_开发知识库ing up and may reach the max limit of 50GB as more schools are opened by the NGO. Also a single table for 'student' and especially 'attendance' will have huge number of records and would make the queries slow. Even if we add another database in future then how easy would it be to split the tables across several databases.
Keeping the limitations in mind we are unable to proceed further. Any approach or suggestion by you will be very helpful for us. Thanks in advance.
EDIT: Thanks a lot to the people answered my question. i got the point: 5O GB is a huge space and it would not get filled any time soon. But that brings me a question: Consider a situation when the number of schools grow up to 200, 300 or 1000 !! Then how should be my database design ? I suppose 50 GB would not be big at that situation.
I used to work for a company that makes school systems; although 50GB would be considered large for most of them, a few had databases that were much larger. Historical records are typically the issue here, specially if you will add additional features over time, such as lead import.
You described two scenarios: a linear shard and a scale up architecture. The linear shard implements a database per school. The scale up puts them all in the same database. There are additional options to consider with SQL Azure. See one of my blob posts about a white paper I published regarding various scalability models: http://geekswithblogs.net/hroggero/archive/2010/12/23/multitenant-design-for-sql-azure-white-paper-available.aspx
Also SQL Azure announced an upcoming feature called Data Federation. This is most likely for you. Here are two blog posts you may find relevant:
http://geekswithblogs.net/hroggero/archive/2011/07/23/preparing-for-data-federation-in-sql-azure.aspx
http://geekswithblogs.net/hroggero/archive/2011/09/07/sharding-library-for-sql-azure-data-federation.aspx
The last link discusses an open-source library, called the Enzo Shard, that I am building to assist developers in taking advantage of the future capabilities of SQL Azure Data Federation. The version that supports data federation is in Beta and allows parallel queries to be performed across federation members (i.e. databases).
Finally, don't miss the posts by Cihan (from Microsoft) that discusses this feature in greater details: http://blogs.msdn.com/b/cbiyikoglu/
In summary, the field of scalability in SQL Azure is evolving. However many capabilities will be coming providing significant data growth and performance oppotunities.
50 Gigabytes is an awful lot of data. School personnel and attendance is a pretty small problem. A properly designed database is unlikely to approach 50 gigabytes for decades, at least.
Even 60 schools should not generate that much data, even if you're tracking standardized testing data of some kind. If there is a secondary school from grades 6 to 12 (I'm using the U.S. for a reference) on the quarter system, with an average of 6 classes per student and 1000 students in the school, there will only be 24,000 class records per year. Not all 30 schools are going to be secondary schools. 50GB should be plenty. I worked with the a database containing enrollment, testing, student and teacher information for one of the biggest school districts in the United States. After 7+ years their database barely approached 30GB.
Also, check out the new Elastic Scale feature in Azure SQL DB : which can help you scale out instead of scaling up.
I would suggest you take a look at Azure Table Storage as well to keep your costs down without worrying about growing size. Obviously the challenge would be to design your application for Table Storage which is "Non-Relational" in nature.
You'll never hit 50GB with just names and a couple other strings/text. Even with all the schools in the same db you'll be good with 5GB. I've admin'd millions of rows of more complex data and never hit 50GB (unless there was a problem!) :)
精彩评论