Consider, please this setup for a database backed application. ( in my case DB is MySQL and app is in Ruby ( Rails 3), but I don't think it matters for this question)
Let's say I have an app for a warehouse.
I have multiple items that would have categories and statuses.
For example table that has parts would have a few statuses such as: in stock, discontinued, backordered and multiple categories, such as: it hardware, automotive, medical, etc.
Also I have other tables that need statuses and categories, such as Vendor: approved, out of business, new Order: Open, processes, shipped, canceled.
Etc.
Here is the question:
I think if I wanted to properly normalize my db - I would have a table called categories, categories_types, statuses, statuses_types.
Then I would store all categories in that table, and any category that is of a certain type, such all categories of parts, would have a foreign key to category_type - parts, and so on. Same for types.
This is the normalized 开发者_StackOverflow社区way.
However I often see that people create separate tables for specific categories, for example, there would be a table called part_categories, vendor_categories, order_statuses, part_status. This is a less normalized db, but I guess when you are dealing with a lot of tables, it might be clearer.
Which of this approaches is a better one? What are the cons & pros in your experience? I usually go with the first setup, but I see the second one so often that I'm beginning to doubt my approach.
Thank you.
I think this depends on how you want to interact with the data. The benefit of the second approach is that it's easy to see which categories and statuses are associated with a specific object (vendor, item, order). Keep in mind that if you do use the first approach you will probably have to have a type identifier in your categories and status tables to identify the kind-of category or status the row is related to (vendor, item, order).
The benefit of the first approach is that it's easier to add statuses and categories for new objects, and there is a simplicity in only having two tables. The problem arises when you want to add additional information to a specific category or status. For example, if order statues need to have an effective_date, but item statues should not have an effective_date. Once you reach this point you'll either have to move to the second approach or add an effective_date that will be null for the other statuses to which the attribute does not apply.
Keep in mind that another approach would be to not create statuses and categories tables at all, but to store the status and category values in the original tables. You can accomplish this with an enumerable (ENUM) in MySQL or in Rails. In MySQL an ENUM is stored in the database as an integer, but it resolves to a word value, like 'processed', 'shipped' or 'canceled'. The benefit to this is that if your statuses do not change often you have one less join to do and it's easier to read the database and Ruby model. In Ruby an ENUM can simply be a list of constants that have a key (integer) and a value (string) associated with them. You can use the integer value to query and update the database and the word value on your application side.
I believe both approaches are legitimate, the path you take really depends on your needs. If you are set on storing the data in the database, then analyze how you will be interacting with statuses and categories - your approach may be different. Which approach will be faster and easier to query? Which one will be easier to update or modify? How often do you read; how often do you write? Finally, keep in mind that you are Agile! Either approach can be transformed into the other with a simple migration and some refactoring. The approach that is simplest for your application now may not be the best one to use in the future, and that's perfectly okay. That's what's so great about being Agile!
In my experience, tables of enumerated names invariably evolve into their own full-fledged model eventually. Typically, it begins by adding boolean flags, or as mentioned in the answer above, referent types or valid date ranges.
From a relational perspective, neither approach - putting all status enums in one table, or breaking them into separate tables - is "more" normalized than the other. But from a type-theoretic standpoint, it makes more sense to put part_categories and vendor_categories in their own separate tables, for no other reason than it requires no code in the model to make sure you don't accidentally associate a vendor category with a part.
If you do end up putting them all in the same table, Rails has a nice feature called polymorphic associations that will automate the type and the id columns for you. It's a reasonable compromise between the two approaches.
Most importantly, I would contend that the enums will eventually take on a model life of their own, in which case you have a very messy job of finding all of them in the various tables, and recasting them in their own table. Tables are cheap; why be frugal with them?
精彩评论