I`m new to data warehousing, but I think my question can be relatively easy answered. I built a star schema, with a dimension table 'product'. This table has a column 'PropertyName' and a column 'PropertyValue'. The dimension therefore looks a little like this:
surrogate_key | natural_key (productID) | PropertyName | PropertyValue | ...
1 5 Size 20 ...
2 5 开发者_高级运维Color red
3 6 Size 20
4 6 Material wood
and so on.
In my fact table I always use the surrogate keys of the dimensions. Cause of the PropertyName and PropertyValue columns my natural key isn`t unique / identifying anymore, so I get way too much rows in my fact table.
My question now is, what should I do with the property columns? Would it be best, to put each property into separate dimensions, like dimension size, dimension color and so on? I got about 30 different properties. Or shall I create columns for each property in the fact table? Or make one dimension with all properties?
Thanks in advance for any help.
Your dimension table 'product' should look like this:
surrogate_key | natural_key (productID) | Color | Material | Size | ...
1 5 red wood 20 ...
2 6 red ...
If you have to many properties, try to group them in another dimension. For example Color and Material can be attributes of another dimension if you can have the same product with same id and same price in another color or material. Your fact table can identify product with two keys: product_id and colormaterial_id...
Reading recommendation: The Data Warehouse Toolkit, Ralph Kimball
Your design is called EAV
(entity-attribute-value) table.
It's a nice design for the sparse matrices (large number of properties with only few of them filled at the same time).
However, it has several drawbacks.
It cannot be indexed (and hence efficiently searched) on two or more properties at once. A query like this: "get all products made of wood and having size or 20" will be less efficient.
Implementing constraints involving several attributes at once is more complex
etc.
If it's not a problem for you, you can use EAV
design.
精彩评论