Should I always aim for 4th normal form when designing databases?
开发者_高级运维I feel that 3rd normal form is closer to my business domain.
For example I have a table with PartNumber
. In my business domain, it is the unique key, no two parts shall ever have the same number. However this is a VARCHAR, putting a primary key to a VARCHAR and then linking it as a foreign key to other table is a huge smell to me.
Should I put Auto Increments IDs everywhere instead? I don't really see the point right away and it complicates code a lot.
As a general rule aim for your database to be in at least Boyce-Codd Normal Form or ideally Fifth Normal Form. Consider 3NF only where there are dependencies that it's important to preserve that wouldn't otherwise be satisfied by 5NF.
4NF has nothing to do with whether you use varchar or integers in your keys and I'm not sure why you seem to think it does.
It depends on the database; using varchar fields as your PK wastes space in your DB storage and indices in MySQL (for example). Having a table and unique integer id for your part numbers may seem less pure, but such things are sometimes necessary in the face of performance.
(And further, it does allow you to update a product number if necessary in one spot, though I suspect this is likely not probable in your business logic.)
Whilst your question asks about 4NF, your particular situation doesn't. Using a VARCHAR field as a primary key isn't wrong, if it's relatively small then I would prefer to use that rather than a surrogate key.
Back to 4NF - from Wikipedia's example, this seems to be something that would, much of the time, just naturally fall into place. It certainly can't hurt to strive for 4NF - at the same time denormalization has its place.
Do you allow your users to edit part numbers? If so, this is a good argument against using it as the primary key. As a (general) rule of thumb PKs shouldn't be user editable because replicating a changed PK through the DB can become quite expensive.
Also - is it that no two parts shall ever have the same number, or no two active parts can have the same number? If it's the latter, do you need to keep a long enough history that you might end up with two part numbers that are the same?
It takes a few weeks to learn how to normalize data. It take months, maybe even years, to learn when to disregard the rules of normalization. Of course, if you disregard the rules of normalization, there will be consequences. If you learn normalization thoroughly, you will know that the consequences are.
Sometimes the consequences of not adhering to a given normal form are light, compared to the benefit of following some other design pattern. For example, when building an OLAP dataabase or a data warehouse, star schema or snowflake schema is often more productive than fully normalized.
For OLTP databases, I would tend to aim for Boyce-Codd normal form, and just deal with any modification anomalies that surface due to departure from 4th or 5th normal form. But it really depends on the case.
精彩评论