I am redesigning a few databases into one encompassing database, and I have noticed the previous designer(s) of the old databases like to store categories in their own tables. For example, say that there is a table boats(bid: integer, bname: string, color: integer), and in the application there is a drop-down box allowing the user to specify the colour of the boat, then there is a table color(cid: integer, cname: string). I would have not included the color table, and just put the colours as strings in the boats table. I realize that this decreases redundant storage of colour names, but is the added run-time cost of joining the boat table with the colour table "worth it"? Also the drop-downs are populated with SELECT cname FROM color statements, while I would have defined a view on SELECT DISTINCT color FROM boats to populate the drop-downs.
The example is simple, but this happens multiple times in the system I am redesigning, even for categories with only two options. This has resulted in many tab开发者_如何学Cles with only 2 fields. Some only have 1 field (I haven't figured out what those are for yet, but I think they are only to populate the drop-downs, and the actual tables contain the values as well).
I would personally keep them in their own table if this were my DB.
If you get into a situation where you get the requirement that Boats a,b and c can only come in silver and black
then you will be thankful that you did. I've seen these types of requests bubble up down the road in a lot of projects.
If you are just concerned about the query complexity you could create a view that joins the information you need so you only need to query it once and with no JOIN
.
If you are worried about the performance implications of the JOIN then I would look at creating the appropriate indexes or possibly an indexed view.
Good luck!
When you know a column should have a limited set of values, you should tell the dbms to enforce that limited set. The three most common ways to deal with that kind of requirement are
- ignore it,
- set a foreign key reference to a table of colors, and
- use a CHECK() constraint against a list of colors.
Of those three, setting a foreign key to a table of colors tends to make life easiest.
I realize that this decreases redundant storage of colour names, but is the added run-time cost of joining the boat table with the colour table "worth it"?
This is a different issue. First, storing foreign key values is a form of data integrity, not a form of redundancy. Keys exist for two reasons: 1) to identify things in the real world, and 2) to be stored in other tables. (Because the thing the key identifies is relevant to the other table.)
Second, if you identify colors by assigning an arbitrary id number to them, you have to use a JOIN to get human-readable information. But colors, like many attributes, carry their identity with them. If you use the color's name itself ("red", "orange", etc.) or use a human-readable code for the name ("R", "O", etc.) you don't need a join. You do still need a table of colors (or a CHECK() constraint), because the column in boats has a limited set of values, and the dbms should enforce use of that limited set of values.
So you could do something like this.
create table boats (
boat_id integer primary key,
registered_name varchar(35) not null,
hull_color varchar(10) not null references hull_colors (color)
);
create table hull_colors (
color varchar(10) primary key
);
insert into hull_colors values ('red'),('orange'),('yellow') etc.
Both those tables are in 5NF.
It is generally better to have a normalized database.
However, in your example, you can use a Categories(ID, Type, Name)
table and store the colors as ( 3, "Color", "Blue" ), ( 4, "Color", "Red" ), ...
This way, you can store more categories in the same table and in the same time, store them separately. Populating a drop-down list will require a simple select of the form select ID, Name from Categories where Type = 'Color'
.
EDIT: Note that this solution violates the first rule of database normalization, as @Catcall said. A 3NF table would be Colors(ID, Name)
. This way, you can refer to a certain color using its ID
.
Using select distinct color from boats
to populate a drop-down list has many disadvantages, for example, what if the Boats table contains no records. Then, your select will return nothing and the drop-down control will not be populated with any value. Another problem is when you have fields containing 'Red'
and 'red'
or similar. See more details on Database Normalization here
It sounds like those are lookup tables so that if the end user want's to add an additional color then they can add it to the database and it will then propagate down the UI. This also gets into normalization. If there is only one place where colors is referenced then the lookup table is not really necessary. However if there are multiple tables where colors are referenced for different things then the lookup table will save you a huge headache down the road.
精彩评论