开发者

What influences a query plan's performance besides what is shown in Mgt Studio's graphical output?

开发者 https://www.devze.com 2023-03-05 12:40 出处:网络
(Note: This was formerly titled \"How do a query plan\'s estimated executions/IO/CPU/Executions, etc, influence performance, all other things being equal?\", but I renamed it now that I know the answe

(Note: This was formerly titled "How do a query plan's estimated executions/IO/CPU/Executions, etc, influence performance, all other things being equal?", but I renamed it now that I know the answer.)

Below is a case where two identical SP calls vary only by the cached plan they are using. The graphical representation of the plan looks identical: it is a simple nested loops join in both cases (see below). However in one case it is using a plan that was cached from an earlier call when many fewer rows in the temp table, and this case is consistently slower than when the SP is recompiled. I am trying to understand what is happening under the hood so that I can better work around the performance problems I am having.

Here are the details, along with a script if you want to test this out yourself:

A SP is called that populates a temp table with either 10 rows or 400 rows. Then the temp table is joined back to a permanent table with about 4 million rows and the results are selected out.

Case 1: Recompile SP, Run the 10 row case first, and the 400 row case second.

Case 2: Recompile SP, Run the 400 row case first, and the 10 row case second.

The performance of the 400 row temp table is consistently worse in Case 1 than in Case 2 even though the execution plans appear to be the same (250ms vs 125ms). I understand that a plan is cached based on the first run of the SP after recompiling, but in both the 10 and 400 row cases, they both use a very simple nested loops join, so I didn't expe开发者_JAVA技巧ct any difference in whether the 400 case ran first or second. But apparently the plan that is cached for the 10 row case is inefficient in the 400 row case.

I can see now that there is more to a cached plan than the visual operations & order that is displayed in the Management Studio graphical representation. The problem is, I don't understand what else is going on under the hood that is causing the difference. I did look at the STATISTICS PROFILE output (see below) and saw differences in Estimate Rows, Estimate IO, Estimate CPU, EstimateExecutions and/or TotalSubtree cost.

Can anyone explain what is going on? Is the Estimated Number of Executions being used to allocate memory as if only 10 rows are expected, and it continually has to reallocate memory over and over to process 400 rows? I'm just guessing.

Case 1 Execution Plan for 400 row case (based on cached plan for 10 row case):

What influences a query plan's performance besides what is shown in Mgt Studio's graphical output?

Case 2 Execution Plan for 400 row case (recompiled):

What influences a query plan's performance besides what is shown in Mgt Studio's graphical output?

What influences a query plan's performance besides what is shown in Mgt Studio's graphical output?

-------------------------------------------------------------------------------------------------------------------------------
-- TEST CASE
------------------------------------------------------------------------------------------------------------------------------- 


-- 1. SET UP
    -- Create a table with 4 million rows

    IF OBJECT_ID('A') IS NOT NULL DROP TABLE A;
    CREATE TABLE A (ID INT PRIMARY KEY, Val1 VARCHAR(100), Val2 VARCHAR(100))

    ;WITH  
        Pass0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows  
        Pass1 AS (SELECT 1 AS C FROM Pass0 AS A, Pass0 AS B),--4 rows  
        Pass2 AS (SELECT 1 AS C FROM Pass1 AS A ,Pass1 AS B),--16 rows 
        Pass3 AS (SELECT 1 AS C FROM Pass2 AS A ,Pass2 AS B),--256 rows 
        Pass4 AS (SELECT 1 AS C FROM Pass3 AS A ,Pass3 AS B, Pass2 AS X, Pass1 AS Y),--65536 x 16 x 4 rows 
        Tally AS (SELECT  ROW_NUMBER() OVER (ORDER BY C) AS Number FROM   Pass4)
    INSERT INTO A
    SELECT Number, 'xxxxxxxxx10xxxxxxxx20xxxxxxxx30xxxxxxxx40', 'xxxxxxxxx10xxxxxxxx20xxxxxxxx30xxxxxxxx40'
    FROM  Tally

    -- Create SP

    IF OBJECT_ID('Proc1') IS NOT NULL DROP PROC Proc1
    GO

    CREATE PROC Proc1
    (
        @TempTableName AS VARCHAR(100)
    )
    AS
        CREATE TABLE #T (ID INT PRIMARY KEY)
        EXEC('INSERT INTO #T SELECT ID FROM ' + @TempTableName)

        SET STATISTICS PROFILE ON

        SELECT A.*
        INTO #X
        FROM #T AS T
        JOIN A AS A ON A.ID = T.ID

        SET STATISTICS PROFILE OFF

        DROP TABLE #X

GO

    -- Generate temp table data 

    IF OBJECT_ID('tempdb..##Temp400') IS NOT NULL DROP TABLE ##Temp400;
    SELECT TOP 400 ID
    INTO ##Temp400
    FROM A
    ORDER BY NEWID()

    IF OBJECT_ID('tempdb..##Temp10') IS NOT NULL DROP TABLE ##Temp10;
    SELECT TOP 10 ID
    INTO ##Temp10
    FROM A
    WHERE ID NOT IN (SELECT ID FROM ##Temp400 AS T)
    ORDER BY NEWID()

-- 2. RUN TEST WHILE MONITORING IN PROFILER

    -- Execute 10 then 400
        EXEC sp_recompile Proc1
        CHECKPOINT
        DBCC DROPCLEANBUFFERS
        GO
        EXEC Proc1 '##Temp10';
        GO
        EXEC Proc1 '##Temp400';
        GO
    -- Execute 400 then 10
        EXEC sp_recompile Proc1
        CHECKPOINT
        DBCC DROPCLEANBUFFERS
        GO
        EXEC Proc1 '##Temp400';
        GO
        EXEC Proc1 '##Temp10';

p.s. This is based on a real case where the performance is much more pronounced than in the example given here. Since the optimal plan was highly dependent on the query size, there was no single plan that fit all row sizes well. My solution was to copy the main query in the SP multiple times within several IF statements, forcing the SQL engine to give each case its own optimal plan. It's the best solution I found, but it feels a bit clunky:

IF @RowCount < 1
    [paste query here]
ELSE IF @RowCount < 50
    [paste query here]
ELSE IF @RowCount < 200
    [paste query here]
ELSE
    [paste query here]


I don't believe, that estimated rows, i/o and cpu affects anything at all as long as execution plans are the same. In your case the execution plans are similar, but there is difference, if you expand the nested loop record, you will see that you have different arguments to this operator. One of them goes with PREFETCH and the other does not. This is how your plans are different and this is what affects your performance.

This looks relevant: http://blogs.msdn.com/b/craigfr/archive/2008/10/07/random-prefetching.aspx


Some of your ideas are flawed.

  • There are no indexes on the temp table so there will be a cost difference accessing 400 rows compared to 10 rows
  • This affects how it JOINs into A (more expensive to find 400 rows in A than 10)
  • The number of rows affects the relative cost of the load into #X
  • ORDER BY NEWID() will change data distribution each run: so the 10 rows may be close in one of your runs
  • the memory used for tables/indexes will be somewhat limited ny NEWID too
  • missed that in table creation, oopsthe plan shows a clustered index on ##temp. This isn't shown in the code

You should show real code rather than contrived code so we can solve your problem, rather than explaining "this is as expected" for contrived code.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号