Adding a seemingly perfectly in开发者_运维技巧dex is having an unexpectedly adverse affect on a query performance...
-- [Data] has a predictable structure and a simple clustered index of the primary key:
ALTER TABLE [dbo].[Data] ADD PRIMARY KEY CLUSTERED ( [ID] )
-- Joins on itself looking for a certain kind of "overlapping" records
SELECT DISTINCT
[Data].ID AS [ID]
FROM
dbo.[Data] AS [Data]
JOIN
dbo.[Data] AS [Compared] ON
[Data].[A] = [Compared].[A] AND
[Data].[B] = [Compared].[B] AND
[Data].[C] = [Compared].[C] AND
([Data].[D] = [Compared].[D] OR [Data].[E] = [Compared].[E]) AND
[Data].[F] <> [Compared].[F]
WHERE 1=1
AND [Data].[A] = @A
AND @CS <= [Data].[C] AND [Data].[C] < @CE -- Between a range
[Data] has about a quarter-million records so far, 10% to 50% of the data satisfies the where clause depending on @A, @CS, and @CE. As is, the query takes 1 second to return about 300 rows when querying 10%, and 30 seconds to return 3000 rows when querying 50% of the data.
Curiously, the estimated/actual execution plan indicates two parallel Clustered Index Scans, but the clustered index is only of the ID, which isn't part of the conditions of the query, only the output. ??
If I add this hand-crafted [IDX_A_B_C_D_E_F]
index which I fully expected to improve performance, the query slows down by a factor of 8 (8 seconds for 10% & 4 minutes for 50%). The estimated/actual execution plans show an Index Seek, which seems like the right thing to be doing, but why so slow??
CREATE UNIQUE INDEX [IDX_A_B_C_D_E_F]
ON [dbo].[Data] ([A], [B], [C], [D], [E], [F])
INCLUDE ([ID], [X], [Y], [Z]);
The Data Engine Tuning wizard suggests a similar index with no noticeable difference in performance from this one. Moving AND [Data].[F] <> [Compared].[F]
from the join condition to the where clause makes no difference in performance.
I need these and other indexes for other queries. I'm sure I could hint that the query should refer to the Clustered Index, since that's currently winning - but we all know it is not as optimized as it could be, and without a proper index, I can expect the performance will get much worse with additional data.
What gives?
== Edit ==
For Gail, here are the execution plans. Of course, the one that references the index is the one queried with the index available. This is a little different than my original description of the clustered index scan - I deleted the auto-gen PK index for testing and can't get it back(?), so this is without ANY indexes at all, hence the table scan. Different look to the query plan, but no noticeable change in performance. (Table Scan is the fast one)
execution plans http://www.imagechicken.com/uploads/1276732894073081600.png
Indexed.sqlplan
Nonindexed.sqlplan
It's doing the CI scan because the CI is the actual data. An index is just a placeholder to the actual data.
An index seek is definitely the incorrect thing to do on a 50% return query, and it's rare to see one used even on a 10% return rate. Usually if it's over a couple of percent, it's going to scan (that's why on smaller tables you can count on a scan to happened nearly every time).
I would suggest making sure the stats are up to date for that table, and possibly make sure that the index isn't in need of maintenance itself.
UPDATE STATS - http://msdn.microsoft.com/en-us/library/ms187348.aspx
精彩评论