Say I have a very long table (~35 million rows) called TimeCard with only 5 columns (tableID, CompanyID, UserID, ProjectID, DailyHoursWorked, entryDate). This is a pretty straight forward table that records employees' worked hours per day per project per company.
I now need to generate a report to find out the employees' total worked hours per month per project for any given company. Instead of performing the aggregation needed when the report runs, I want to build a table-like data structure that already have all the Company/Project/User data aggregated by month, so when the report runs, I can just query that data structure directly without performing any run-time aggregation since ~35million records can take a few mins.
So I have 2 different ways. One create an extra physical table with (CompanyID, UserID, ProjectID, MonthlyHoursWorked, Month) as my columns and just use trigger at the TimeCard table to modify the values at the extra table. Or I can create an Indexed View. So I tried both. I first tried the indexed view with the following code:
CREATE VIEW [dbo].[vw_myView] WITH SCHEMABINDING AS
SELECT
JobID,
ProjectID,
Sum(DailyHoursWorked) AS MonthTotal,
DATEADD( Month, DATEDIFF( Month, 0, entryDate), 0 ) AS entryMonth,
CompanyID,
COUNT_BIG(*) AS Counter
FROM
dbo.TimeCard
Group By DATEADD( Month, DATEDIFF( Month, 0, entryDate ), 0 ), JobID, ProjectID, CompanyID
Go
CREATE UNIQUE CLUSTERED INDEX [IX_someInd开发者_运维百科ex] ON [dbo].[vw_myView]
(
[CompanyID] ASC,
[entryMonth] ASC,
[UserID] ASC,
[ProjectID] ASC
)
The indexed view created correctly and totaling with ~5 million rows total.
However, every time if I clear the SQL cache, and run the following query: *select * from vw_myView where companyID = 1*, it takes almost 3 minutes. If I go with the extra table route as I mentioned above, with my cache cleared, it takes around 4 seconds.
My questions are, is Indexed View a bad choice for this particular scenario? In particular I am interested to know if the entire indexed view gets re-calculated/re-aggregated every time when the underlying table (TimeCard) is changed or when a query is run against it?
Thanks!
If you are not using either the Enterprise or Developer edition, then you need to use the with (noexpand)
hint:
select *
from vw_myView with (noexpand)
where companyID = 1
When the underlying data changes, the view will only update rows related to the changed data, not the entire table. This can have an adverse impact on an OLTP database with a high degree of inserts, but if usage is only moderate, should not pose a performance problem.
A tip from Microsoft:
As a general recommendation, any modifications or updates to the view or the base tables underlying the view should be performed in batches if possible, rather than singleton operations. This may reduce some overhead in the view maintenance.
I think you are on the right path with using an index View. However, have you put indexes on table you are querying from, TimeCard
for your aggregate columns. You need to make an Index of JobID, ProjectID, entryDate, CompanyID
(1 index). If you use 1 index for each column it will NOT solve your problems because the Query will have to use all 4 indexes together.
I do think using the trigger will be slow but in a different way. It will make your query faster but it will slow down every insert you do into TimeCard
. If you do decide to go with the Trigger then I would make sure I index that table as well or might also be slow, not 3 minutes slow, but still slow to sort and return data.
I would not used a view for this. I think the table populated by the trigger is the way to go. But don't forget to adjust the totals for updates and deletions as well as inserts.
I don't think, you need indexed view (I don't say, the indexed view is bad/good idea). I think, you need the index on column "CompanyID" and "EntryDate". After then you should use where condition "WHERE CompanyID = @CompanyID AND EntryDate >= @StartDate AND EntryDate <= @EndDate".
If the table is processed primairly by the "EntryDate", you can use a cluster index on "EntryDate" column.
After this, I think the select statement will much faster then now.
Did you consider partitioning the table. You can think of combination of list and hash partitioning table.
Well, the idea of an indexed view is definetly good, and if you can create a clustered index on it - perfect. It should be fast - much better than 3 minutes for a query!
On the other hand: if those chunks of information are only ever updated e.g. once a month or once weekly (or even every night), it might be better to just put those into a separate DailyTimeCard
table which gets filled/updated by e.g. an SSIS package regularly.
I wouldn't recommend using triggers to constantly update such a fact table either - if you really really need to have the most up to date data at every given second in the day, then stay with the indexed view.
But, your indexed view does quite a bit of heavy lifting - it sums, it groups by and so on. Keeping that up to date at all times, while your underlying TimeCard
table changes and gets updated, will cause some load on your system - hard to say how much - but it could be quite noticeable.
If you find a way to extract the information you need - group and sum once and then store that aggregated data into a separate fact table - you should have both: fast and quick queries on the DailyTimeCard
table, and the rest of your system should be less burdened with keeping the indexed view up to date all the time.
Maybe it's not the solution you're looking for - but just think about it for a bit. It might - or might not - work out for you!
精彩评论