(Note: This was formerly titled "How do a query plan's estimated executions/IO/CPU/Executions, etc, influence performance, all other things being equal?", but I renamed it now that I know the answer.)
Below is a case where two identical SP calls vary only by the cached plan they are using. The graphical representation of the plan looks identical: it is a simple nested loops join in both cases (see below). However in one case it is using a plan that was cached from an earlier call when many fewer rows in the temp table, and this case is consistently slower than when the SP is recompiled. I am trying to understand what is happening under the hood so that I can better work around the performance problems I am having.
Here are the details, along with a script if you want to test this out yourself:
A SP is called that populates a temp table with either 10 rows or 400 rows. Then the temp table is joined back to a permanent table with about 4 million rows and the results are selected out.
Case 1: Recompile SP, Run the 10 row case first, and the 400 row case second.
Case 2: Recompile SP, Run the 400 row case first, and the 10 row case second.
The performance of the 400 row temp table is consistently worse in Case 1 than in Case 2 even though the execution plans appear to be the same (250ms vs 125ms). I understand that a plan is cached based on the first run of the SP after recompiling, but in both the 10 and 400 row cases, they both use a very simple nested loops join, so I didn't expe开发者_JAVA技巧ct any difference in whether the 400 case ran first or second. But apparently the plan that is cached for the 10 row case is inefficient in the 400 row case.
I can see now that there is more to a cached plan than the visual operations & order that is displayed in the Management Studio graphical representation. The problem is, I don't understand what else is going on under the hood that is causing the difference. I did look at the STATISTICS PROFILE output (see below) and saw differences in Estimate Rows, Estimate IO, Estimate CPU, EstimateExecutions and/or TotalSubtree cost.
Can anyone explain what is going on? Is the Estimated Number of Executions being used to allocate memory as if only 10 rows are expected, and it continually has to reallocate memory over and over to process 400 rows? I'm just guessing.
Case 1 Execution Plan for 400 row case (based on cached plan for 10 row case):
Case 2 Execution Plan for 400 row case (recompiled):
-------------------------------------------------------------------------------------------------------------------------------
-- TEST CASE
-------------------------------------------------------------------------------------------------------------------------------
-- 1. SET UP
-- Create a table with 4 million rows
IF OBJECT_ID('A') IS NOT NULL DROP TABLE A;
CREATE TABLE A (ID INT PRIMARY KEY, Val1 VARCHAR(100), Val2 VARCHAR(100))
;WITH
Pass0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
Pass1 AS (SELECT 1 AS C FROM Pass0 AS A, Pass0 AS B),--4 rows
Pass2 AS (SELECT 1 AS C FROM Pass1 AS A ,Pass1 AS B),--16 rows
Pass3 AS (SELECT 1 AS C FROM Pass2 AS A ,Pass2 AS B),--256 rows
Pass4 AS (SELECT 1 AS C FROM Pass3 AS A ,Pass3 AS B, Pass2 AS X, Pass1 AS Y),--65536 x 16 x 4 rows
Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY C) AS Number FROM Pass4)
INSERT INTO A
SELECT Number, 'xxxxxxxxx10xxxxxxxx20xxxxxxxx30xxxxxxxx40', 'xxxxxxxxx10xxxxxxxx20xxxxxxxx30xxxxxxxx40'
FROM Tally
-- Create SP
IF OBJECT_ID('Proc1') IS NOT NULL DROP PROC Proc1
GO
CREATE PROC Proc1
(
@TempTableName AS VARCHAR(100)
)
AS
CREATE TABLE #T (ID INT PRIMARY KEY)
EXEC('INSERT INTO #T SELECT ID FROM ' + @TempTableName)
SET STATISTICS PROFILE ON
SELECT A.*
INTO #X
FROM #T AS T
JOIN A AS A ON A.ID = T.ID
SET STATISTICS PROFILE OFF
DROP TABLE #X
GO
-- Generate temp table data
IF OBJECT_ID('tempdb..##Temp400') IS NOT NULL DROP TABLE ##Temp400;
SELECT TOP 400 ID
INTO ##Temp400
FROM A
ORDER BY NEWID()
IF OBJECT_ID('tempdb..##Temp10') IS NOT NULL DROP TABLE ##Temp10;
SELECT TOP 10 ID
INTO ##Temp10
FROM A
WHERE ID NOT IN (SELECT ID FROM ##Temp400 AS T)
ORDER BY NEWID()
-- 2. RUN TEST WHILE MONITORING IN PROFILER
-- Execute 10 then 400
EXEC sp_recompile Proc1
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
EXEC Proc1 '##Temp10';
GO
EXEC Proc1 '##Temp400';
GO
-- Execute 400 then 10
EXEC sp_recompile Proc1
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
EXEC Proc1 '##Temp400';
GO
EXEC Proc1 '##Temp10';
p.s. This is based on a real case where the performance is much more pronounced than in the example given here. Since the optimal plan was highly dependent on the query size, there was no single plan that fit all row sizes well. My solution was to copy the main query in the SP multiple times within several IF statements, forcing the SQL engine to give each case its own optimal plan. It's the best solution I found, but it feels a bit clunky:
IF @RowCount < 1
[paste query here]
ELSE IF @RowCount < 50
[paste query here]
ELSE IF @RowCount < 200
[paste query here]
ELSE
[paste query here]
I don't believe, that estimated rows, i/o and cpu affects anything at all as long as execution plans are the same. In your case the execution plans are similar, but there is difference, if you expand the nested loop record, you will see that you have different arguments to this operator. One of them goes with PREFETCH and the other does not. This is how your plans are different and this is what affects your performance.
This looks relevant: http://blogs.msdn.com/b/craigfr/archive/2008/10/07/random-prefetching.aspx
Some of your ideas are flawed.
- There are no indexes on the temp table so there will be a cost difference accessing 400 rows compared to 10 rows
- This affects how it JOINs into A (more expensive to find 400 rows in A than 10)
- The number of rows affects the relative cost of the load into #X
- ORDER BY NEWID() will change data distribution each run: so the 10 rows may be close in one of your runs
- the memory used for tables/indexes will be somewhat limited ny NEWID too
- missed that in table creation, oops
the plan shows a clustered index on ##temp. This isn't shown in the code
You should show real code rather than contrived code so we can solve your problem, rather than explaining "this is as expected" for contrived code.
精彩评论