SUM/GROUP performance and the Primary Key_问答_开发者

I have a table myTable with myGuid (uniqueidentifier), myValues (float), myGroup (integer) and a bunch of other fields which are not important right now. I want to do something as simple as:

SELECT SUM(myValues)
  FROM myTable
 WHERE myGuid IN (SELECT * FROM ##test)
 GROUP BY myGroup

##test is just a temporary table with a single field (guid_filter) containing a bunch of uniqueidentifiers.

Now here's the strange thing:

When I create myTable with myGuid as the Primary Key (which seems like the obvious thing to do), the query is slow (<EDIT>8-12s</EDIT>).
When I create myTable with myAutoInc, an integer auto-increment field, as the Primary Key, the query is fast (~2s), even though the WHERE clause still filters by myGuid. (myGuid has just a "normal" non-clustered index in this scenario.)

Is there any logical explanation to this? My (naive) assumption was that the first option is faster, since SQL Server can use the guid to look up myValues rather than having to go through guid -> myAutoInc -> myValues. So, the result was very surprising for me.

Here's the SHOWPLAN_TEXT output. Slow scenario (XML query plan): (EDIT: Updated, thanks to Remus for noticing that there was an unnecessary additional non-clustered index on myGuid)

|--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [Expr1015]=(0) THEN NULL ELSE [Expr1016] END))
     |--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([Expr1015]=COUNT_BIG([myDB].[dbo].[myTable].[myValues]), [Expr1016]=SUM([myDB].[dbo].[myTable].[myValues])))
          |--Sort(DISTINCT ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC, [myDB].[dbo].[myTable].[myGuid] ASC))
               |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[##test].[guid_filter], [Expr1014]) OPTIMIZED WITH UNORDERED PREFETCH)
                    |--Table Scan(OBJECT:([tempdb].[dbo].[##test]))
                    |--Clustered Index Seek(OBJECT:([myDB].[dbo].[myTable].[PK__myTable__2334397B]), SEEK:([myDB].[dbo].[myTable].[myGuid]=[tempdb].[dbo].[##test].[guid_filter]) ORDERED FORWARD)

Table 'myTable'. Scan count 0, logical reads 38046, physical reads 1, read-ahead reads 6914, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '##test'. Scan count 1, logical reads 23, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Fast scenario (XML query plan):

|--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [globalagg1009]=(0) THEN NULL ELSE [globalagg1011] END))
     |--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([globalagg1009]=SUM([partialagg1008]), [globalagg1011]=SUM([partialagg1010])))
          |--Parallelism(Gather Streams, ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC))
               |--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([partialagg1008]=COUNT_BIG([myDB].[dbo].[myTable].[myValues]), [partialagg1010]=SUM([myDB].[dbo].[myTable].[myValues])))
                    |--Sort(DISTINCT ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC, [myDB].[dbo].[myTable].[myAutoInc] ASC))
                         |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([myDB].[dbo].[myTable].[myGroup], [myDB].[dbo].[myTable].[myAutoInc]))
                              |--Nested Loops(Inner Join, OUTER REFERENCES:([myDB].[dbo].[myTable].[myAutoInc], [Expr1017]) OPTIMIZED WITH UNORDERED PREFETCH)
                                   |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[##test].[guid_filter], [Expr1016]) OPTIMIZED WITH UNORDERED PREFETCH)
                                   |    |--Table Scan(OBJECT:([tempdb].[dbo].[##test]))
                                   |    |--Index Seek(OBJECT:([myDB].[dbo].[myTable].[myGuid]), SEEK:([myDB].[dbo].[myTable].[myGuid]=[tempdb].[dbo].[##test].[guid_filter]) ORDERED FORWARD)
                                   |--Clustered Index Seek(OBJECT:([myDB].[dbo].[myTable].[PK__myTable__2334397B]), SEEK:([myDB].[dbo].[myTable].[myAutoInc]=[myDB].[dbo].[myTable].[myAutoInc]) LOOKUP ORDERED FORWARD)

Table 'myTable'. Scan count 0, logical reads 66988, physical reads 48, read-ahead reads 2515, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '##test'. Scan count 5, logical reads 23, physical reads 0, read-ahead 开发者_开发百科reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Looking at the plan in the 'slow' case it shows that the query does a seek on the index [myDB].[dbo].[myTable].[myGuid] followed by a clustered index seek on [myDB].[dbo].[myTable].[PK__myTable__2334397B]. This only makes sense if you have created both a non-clustered index on myTable(myGuid) and also declared myTable(myGuid) as clustered index key (it appears so, judging from the typical 'PRIMARY KEY' declaration auto-generated name naming convention of the clustered index object 'PK_...').

Other than that, the plans are very similar and they're both quite bad in that they include a SORT. The difference in width of the autoInc column vs. the GUID involved in the potential larger width of the non-clustered index in the first case may explain the difference, but I doubt is the full story.

Please redo the test making sure that you have a clustered key on myGuid and that you do NOT have also a non-clustered index on the same key. The plan should include only one single seek on myTable, using the clustered index, to compare exactly the cases you wanted to compare.

Also, obviously, make sure you compare the same ##test content and the buffer pool cache is warmed up in both case identically. Run DBCC FREESYSTEMCACHE('All') before each test then run the query at least 5 times, negleting the first run (it will be the run that warms up the buffer pool).

Also, as Arthur already noted, having an order guarantee on ##test (ie. a clustered key) could speed up things as the nested loops can be replaced with a merge join, if ##test content is large enough. If the ##temp has only few rows, then the nested loop is better and order makes no difference.

A GUID is a poor choice of clustered index, simply because it's so big. Using an INTEGER field allows the database to pack much much more information into a 'page', so fewer pages need to be fetched from disk for any given query.

Also note that the cluster key(s) are stored in every non-clustered index (since that's what's used to locate the data), which compounds the problem.