We're thinking about adding a weekly summary tabl开发者_C百科e to our little data warehouse. We have a classic time dimension down to the daily level (Year/Month/Day) with the appropriate Week/Quarter/etc. columns.
We'd like to have the time key in this new weekly summary table reference our time dimension. What's the best practice here—have the time key reference the id of the first day in the week it represents? Or the last day? Or something entirely different?
By convention, the fact tables with date period aggregations (week, month...) reference the DateKey of the last day of the period -- so, for this example you would reference the last day of the week.
Kind of logical too, the week must end in order to be aggregated.
It is important to clearly state (somewhere) that the grain of the fact table is one-week, so that report designers are aware of this.
Days are a good example of an entity best identified by natural keys — their representations in Gregorian calendar.
To identify a week or a month, it's best to use its first day. In Oracle
, you can easily retrieve it by a call to TRUNC
:
SELECT TRUNC(fact_date, 'month'), SUM(fact_value)
FROM fact
GROUP BY
TRUNC(fact_date, 'month')
In other systems it's a little bit more complex but quite easy too.
What about making a new dimension "Week"?
You can create a relation between time and week dimension, if you need.
Apropos an earlier answer I would actually expect to store data associated with an interim level of the time dimension hierarchy - when it relates to an atomic measurement for that interim time period - by attaching to the key associated with the first day of the period - this makes it much more straightforward when loading (esp with months - I guess weeks might always require some calculation) and also when reporting - nonetheless it is a convention and as long as you pick a common-sense option (and stick to it) you will be fine.
BTW do not create a week dimension - you should be using a rich time dimension with all the hierarchies available within it for year, quarter, month, week, day etc (bearing in mind there are often multiple, exclusive heirarchies) and in this instance only would also recommend a non-meaningless surrogate key in the form 20100920 - dates are immutable and in this format can easily be contained as int columns so there is little value in using a meaningless keys for dates (or in dim_time either) - if you have ever had to write queries to dereference data where meaningless SKs are used for the time dimension you know the (unnecessary) pain...
M
精彩评论