Normalization: Is it considered Normal Form compliant to split static, numeric values like a year into their own table?_问答_开发者

I am having an interesting discussion with another database designer about normalization. In this example, we have a GameTitles table and each record mu开发者_如何学Gost contain the year in which the game was released. He says 2NF mandates that everything must be normalized, so, to be compliant, the year field should be split off into a ReleaseYears table with its own primary key that is referenced by the GameTitles table. I say it should remain as a field on the GameTitles table itself.

My argument for this is that a year is just a non-primitive numeric value that is static by its very nature (ie, 2011 will always be 2011). Due to this, it serves as its own identifier and needs nothing to reference it since it is what it is. This also introduces additional maintenance since you now have to add a new year to the table just to reference it. If you prepopulate the table with a large range of years then you have extra records that potentially won't have references to them at all. This also increases database size since you now have an extra table, record overhead, and the additional primary key for the year itself. If you keep the year as a field on the GameTitles table, you eliminate all this additional maintenance and overhead.

Thoughts on this?

One can easily look up the definition of 2NF - and "everything normalized" is not it. If one replaces a single column c (here year) with a construct of a new table (id, c), where id is unique and replace c in the former table with id (surrogate key anyone?), one cannot fulfill more of the logical constraints of the normalization rules in any way ever (proof left to the reader - all functional dependencies from c are in 1:1 relationship to those with the replaced column id). If one suggests such a thing, he hasn't understood normalization.

Creating a separate table for any attribute has nothing to do with normalization. 2NF, 3NF, BCNF, 4NF, 5NF are all concerned with eliminating non-key dependencies. If you remove any single attribute to a new table and replace it with a foreign key attribute then the dependencies in the table are logically going to be just the same as before - so the revised version of the table is no more or less normalized than it was before.

As others have pointed out, the other database designer is simply wrong, but your reasoning is wrong as well. Assume you start with this table, which has a single candidate key, "game_title".

Table: game_titles

game_title                      year_first_released
--
The first game                  1998
The second game                 1999
Best game: the third one        2001
The fourth game                 2003
Forty-two, the end of games     2011

You evaluate whether it's in 2NF by asking yourself these questions.

Q: First of all, is it in 1NF?

A: Yes, it is.

Q: What are the prime attributes (attributes that are part of a candidate key)?

A: "game_title" is the only prime attribute.

Q: What are the non-prime attributes?

A: "year_first_released" is the only one.

Q: Is "year_first_released" functionally dependent on the whole of "game_title", or on just a part of it?

A: The sole candidate key, "game_title", is a single column; it doesn't even have parts. So "year_first_released" is functionally dependent on the whole of "game_title".

Voilà. You've found 2NF.

You can cut through some of the formal terms by asking first whether it's in 1NF, and then answering this question.

Q: Are there any composite candidate keys?

A: No.

Voilà. You've found 2NF again.

By definition, for a table to violate 2NF, it has to have at least one candidate key that has more than one column.

Here are your reasons for rejecting your friend's opinion.

A year is just a non-primitive numeric value.
A year is static by its very nature.
A year serves as its own identifier.
A table of years introduces additional maintenance.
A table of years might have extra rows that aren't referenced.
A table of years increases database size.

None of these reasons have anything at all to do with whether a table is in 2NF.

In designing a database, it's not wrong to consider maintenance issues, database size, unreferenced rows, range constraints, and so on. It's just wrong to call those things normalization.

Oh, and that two-column table that I provided above--it's in 5NF.

I tend to have separation between my logical and physical database designs.

In my logical design, the normalized form of a year column would be a year table.

However, in my physical design, I would de-normalize it. Unless you can demonstrate a data storage or retrieval anomaly that would be resolved by normalizing, or a business reason (for ex. track changes), I would leave the Year column in the table itself.

Now, if you had something akin to a ProductName VarChar (255) then it would be very well served to have it's own table because of the plethora of benefits (performance gains, further normalization, etc), but I don't particularly see those benefits in this case.

Here's an extreme example of a business requirement to track changes:

I had a table design that stored the Gender of a person. So a bit flag - 1 for Female, 0 for Male should be acceptable, right? In this specific app, tracking gender changes was a legitimate requirement. The gender flag not only got it's own table, but it instantly formed a Many-To-Many relationship with the Employee.