Setup:
I was trying to understand the difference between identifying and non-identifying relationships when I found this great article on stackexchange. What's the difference between identifying and non-identifying relationships?
After reading a few comments it brought another question to mind about a problem I have been having.
Question:
Should I use multi-column primary keys on every child table and what are the advantages/disadvantages to doing so?
To better illustrate my question I have created an example below. I also included the comments that caused me to ask this question.
Example:
In my situation, I know the building_id
and I need to get bed.data
.
#1 - My current DB structure:
TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }
This type of table structure would require me to use a few joins to get the data I need. Not a big deal but kind of a pain since I run into this situation a lot.
#2 - My interpretation of Bill Karwin's suggested DB structure (see article comments below):
TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }
This table structure seems to eliminate the need for joins in my situation. So what are the disadvantages to this table structure? I really like the idea of not doing so many join statements.
Comments From Article:
What's the difference between identifying and non-identifying relationships?
@hobodave: It's the "convention over configuration" argument. Some schools of thought are that every table should define its primary key for a single-column pseudokey named id that auto-generates its values. Application frameworks like Rails have popularized this as a default. They treat natural keys and multi-column keys as divergent from their conventions, needed when using "legacy" databases. Many other frameworks have followed this lead. – Bill Karwin Mar 10 '10 at 23:06
It seems like "properly" constructing identifying relationships would lead to obnoxiously huge primary keys. e.g. Building has Floor has Room has Bed. The PK for Bed would be (bed_id, floor_id, room_id, building_id). It seem's strange that I've never seen this in practice, nor heard it suggested as a way to do anything. That's开发者_如何转开发 a lot of redundant data in the PK. – hobodave Mar 10 '10 at 23:34
@hobodave: I have seen multi-column primary keys that are even larger. But I take your point. Consider that multi-column primary keys convey more information; you can query the Beds table for all beds in a specific building without doing any joins. – Bill Karwin Mar 11 '10 at 1:00
This data is normalized
TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }
This table is not (bad idea)
TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }
- In the first (good) table you do not have unneeded duplicated data.
- Inserts in the first table will be much faster.
- The first tables will fit more easily in memory, speeding up your queries.
- InnoDB is optimized with model A in mind, not with model B.
- The latter (bad) table has duplicated data, if that gets out of sync, you will have a mess. DB A cannot is much harder to get out of sync, because the data is only listed once.
- If I want to combine data from the building, floor, room and bed I will need to combine all four tables in model A as well as model B, how are you saving time here.
- InnoDB stores indexed data in its own file, if you
select
only indexes, the tables themselves will never be accessed. So why are you duplicating the indexes? MySQL will never need to read the main table anyway. - InnoDB stores the PK in each an every secondary index, with a composite and thus long PK, you are slowing down every select that uses an index and balooning the filesize; for no gain what so ever.
- Do you have serious speed problem? If not, you are you denormalizing your tables?
- Don't even think about using MyISAM which suffers less from these issues, it is not optimized for multi-join databases and does not support referential intregrity or transactions and is a poor match for this workload.
- When using a composite key you can only ever use the rightmost-part of the key, i.e. you cannot use
floor_id
in tablebed
other than usingid+building_id+floor_id
, This means that you may have to use much more key-space than needed in Model A. Either that or you need to add an extra index (which will drag around a full copy of the PK).
In short
I see absolutly zero benefit and a whole lot of drawbacks in Model B, never use it!
I think it's pretty unlikely that your #2 is what Bill Karwin meant. Usually, "id" implies an automatic numeric sequence. I think it's more likely he meant something along these lines. Columns that make up primary keys are between asterisks.
TABLE { COLUMNS }
-----------------------------------------------------------------------
building { *building_id*, other columns }
floor { *building_id, floor_num*, other columns }
room { *building_id, floor_num, room_num*, other columns }
bed { *building_id, floor_num, room_num, bed_num* (?), other columns }
I'm not sure what other columns you might have for "bed", though. Twin, Full, Queen, King? That could make sense. If that's the case, then this table
bed { *building_id, floor_num, room_num, bed_num*, bed_size }
is far from "denormalized". In fact, it's in 5NF.
If you test the performance of these two schemas, you'll probably find that this one runs rings around your #1 most of the time. In the batch of queries I ran, it's about 30 times faster.
The first tables structure is normalized,classical structure. But unfortunately this one is not applicable for the big project. Because if your table building contains many data rows e.g. million depending on which cities or countries you use your join will be very slow. So in real projects denormalized tables are used which contain all aggregated info. You can work with such tables directly or use standalone servers like sphinx for searching data. Regarding primary key on three fields I think in this case this one is redundant. Because
- If you use innodb this key will be added to all secondary keys in this table.
- If you use interface for managing beds it will be convenient to use one field id for working with a specific rows than with three fields.
- If you want to guarantee uniqueness of row you can use UNIQUE KEY on these 3 fields.
精彩评论