A source system tracks student attendance for a school district by reporting absence events. Attendance on any particular day can be determined by examining three datasets: school calendar, student enrollment, and absence.
On any given school day, the number of enrolled students in attendance is usually much larger than the number that are absent, so this approach reduces the number of records stored to track attendance significantly.
I am trying to determine the proper way to represent daily attendance in a dimensional model. The most obvious way is to create a factless table with a grain per school day per student, and an attendance dimension tha开发者_运维知识库t has values for both attendance and absence reasons. This is quite straightforward to work with OLAP, but the downside is the size of the fact table.
For example, for 30,000 students and 188 school days means that there are approximately 0.5 million records per year (if this doesn't seem large enough to be an issue, then consider an example in which attendance must be reported on per period rather than per day). Contrast this to a fact table that records only absences and the number is considerably smaller. However, if I do this, then I am not sure how to build cubes that aggregate daily attendance facts.
The specific OLAP technology being used is SQL Server Analysis Services 2008 R2. Any thoughts?
What if you used two fact tables: one for current period (last month for instance) and another for historical data (recording just absence) - I suppose users (teachers for instance) do not need information about attendance of some particular student of the second class of the day 5 months ago, but they might need this information for the last week/month.
精彩评论