I developing a tool which may got more than a million data to fill in.
current i h开发者_运维技巧ave designed single table with 36 coloumns. my question is do I need to divide these into multiple tables or single??
If single what is the advantage and disadvantage
if multiple then what is the advantage and disadvantage
and what will be the engine to use for speed...
my concern is a large database which will have atleast 50000 queries perday..
any help??
Yes, you should normalize your database. A general rule of thumb is that if a column that isn't a foreign key contains duplicate values, the table should be normalized.
Normalization involves splitting your database into tables, and helps to:
- Avoid modification anomolies.
- Minimize impact of changes to the data structure.
- Make the data model more informative.
There is plenty of information about normalization on Wikipedia.
If you have a serious amount of data and don't normalize, you will eventually come to a point where you will need to redesign your database, and this is incredibly hard to do retrospectively, as it will involve not only changing any code that accesses the database, but also migrating all existing data to the new design.
There are cases where it might be better to avoid normalization for performance reasons, but you should have a good understanding of normalization before making this decision.
First and foremost ask yourself are you repeating fields or attributes of fields. Does your one table contain relationships or attributes that should be separated. Follow third normal form...we need more info to help but generally speaking one table with thirty six columns smells like a db fart.
If you want to store a million rows of the same kind, go for it. Any decent database will cope even with much bigger tables.
Design your database to best fit the data (as seen from your application), get it up, and optimize later. You will probably find that performance is not a problem.
You should model your database according to the data you want to store. This is called "normalization": Essentially, each piece of information should only be stored once, otherwise a table cell should point to another row or table containing the value. If, for example, you have table containing phone numbers, and one column contains the area code, you will likely have more than one phone number with the same value in the same column. Once this happens, you should set up a new table for area codes and link to its entries by referencing the primary key of the row the desired area code is stored in.
So instead of
id | area code | number
---+-----------+---------
1 | 510 | 555-1234
2 | 510 | 555-1235
3 | 215 | 555-1236
4 | 215 | 555-1237
you would have
id | area code id | number | area code
---+---------- ---+----------+-----------
1 | 510 1 | 555-1234 | 1
2 | 215 2 | 555-1235 | 1
3 | 555-1236 | 2
4 | 555-1237 | 2
The more occurences of the same value you have, the more likely will you save memory and get quicker performance if you organize your data in this way, especially when you're handling string values or binary data. Also, if an area code would change, all you need to do is update a single cell instead of having to perform an update operation on the whole table.
Try this tutorial.
Correlation does not imply causation.
Just because shitloads of columns usually indicate a bad design, doesn't mean that a shitload of columns is a bad design.
If you have a normalized model, you store whatever number of columns you need a single table.
It depends!
Does that one table contain a single 'entity'? i.e. Are all 36 columns attributes a single thing, or are there several 'things' mixed together?
If mixed, then you should normalise (separate into distinct entities with relationships between them). You should aim for at least Third Normal Form (3NF).
A best practice is to normalise as much as you can; if you later identify a performance problem, then denormalise as little as you can.
精彩评论