开发者

Questionable performance using IF EXISTS with inner existence checks

开发者 https://www.devze.com 2023-02-28 09:59 出处:网络
This is in a stored procedure..This if statement, then I do a little work.The @AsOfDate is a passed in variable of date datatype. The question I have is Why do I get better performance by removing the

This is in a stored procedure..This if statement, then I do a little work. The @AsOfDate is a passed in variable of date datatype. The question I have is Why do I get better performance by removing the inner-most exists, but ONLY when the entire statement is in an IF EXISTS?

The two tables:

  • dbo.TXXX_InventoryDetail -- 1.3 billion records..stats up to date
  • dbo.TXXX_InventoryFull -- 9.8 million records..stats up to date

Statement:

if exists (select 1
             from dbo.TXXX_InventoryDetail o
            where exists (select 1
                           from dbo.TXXX_InventoryFull i
                          where i.C001_AsOfDate= o.C001_AsOfDate
                            and i.C001_ProductID=o.C001_ProductID
                            and i.C001_StoreNumber=o.C001_StoreNumber
                            and i.C001_AsOfDate=@AsOfDate
                            and (i.C001_LastModelDate!=o.C001_LastModelDate
                                  or o.C001_InventoryQty!=o.C001_InventoryQty
                                  or i.C001_OnOrderQty!=o.C001_OnOrderQty
                                  or i.C001_TBOQty!=o.C001_TBOQty
                                  or i.C001_ModelQty!=o.C001_ModelQty
                                  or i.C001_TBOAdjustQty!=o.C001_TBOAdjustQty
                                  or i.C001_ReturnQtyPending!=o.C001_ReturnQtyPending
                                  or i.C001_ReturnQtyInProcess!=o.C001_ReturnQtyInProcess
                                  or i.C001_ReturnQtyDueOut!=o.C001_ReturnQtyDueOut))
              and o.C001_AsOfDate=@AsOfDate)

io output:

  • Table 'TXXX_InventoryFull'. Scan count 9240262, logical reads 29548864
  • Table 'T001_InventoryDetail'. Scan count 1, logical reads 17259

If I remove the second where exists and do a join:

 if exists (select 1
             from dbo.TXXX_InventoryDetail o,
                  dbo.TXXX_InventoryFull i
            where i.C001_AsOfDate= o.C001_AsOfDate
                            and i.C001_ProductID=o.C001_ProductID
          开发者_如何学Go                  and i.C001_StoreNumber=o.C001_StoreNumber
                            and i.C001_AsOfDate=@AsOfDate
                            and (i.C001_LastModelDate!=o.C001_LastModelDate
                                  or o.C001_InventoryQty!=o.C001_InventoryQty
                                  or i.C001_OnOrderQty!=o.C001_OnOrderQty
                                  or i.C001_TBOQty!=o.C001_TBOQty
                                  or i.C001_ModelQty!=o.C001_ModelQty
                                  or i.C001_TBOAdjustQty!=o.C001_TBOAdjustQty
                                  or i.C001_ReturnQtyPending!=o.C001_ReturnQtyPending
                                  or i.C001_ReturnQtyInProcess!=o.C001_ReturnQtyInProcess
                                  or i.C001_ReturnQtyDueOut!=o.C001_ReturnQtyDueOut)
              and o.C001_AsOfDate=@AsOfDate)

io output:

  • Table 'TXXX_InventoryDetail'. Scan count 0, logical reads 333952
  • Table 'TXXX_InventoryFull'. Scan count 1, logical reads 630

Now..the reason I think it is the if exists is that if I remove it and do a select count(*) like this:

select COUNT(*)
             from dbo.T001_InventoryDetail o
            where exists (select 1
                           from dbo.TXXX_InventoryFull i
                          where i.C001_AsOfDate= o.C001_AsOfDate
                            and i.C001_ProductID=o.C001_ProductID
                            and i.C001_StoreNumber=o.C001_StoreNumber
                            and i.C001_AsOfDate=@AsOfDate
                            and (i.C001_LastModelDate!=o.C001_LastModelDate
                                  or o.C001_InventoryQty!=o.C001_InventoryQty
                                  or i.C001_OnOrderQty!=o.C001_OnOrderQty
                                  or i.C001_TBOQty!=o.C001_TBOQty
                                  or i.C001_ModelQty!=o.C001_ModelQty
                                  or i.C001_TBOAdjustQty!=o.C001_TBOAdjustQty
                                  or i.C001_ReturnQtyPending!=o.C001_ReturnQtyPending
                                  or i.C001_ReturnQtyInProcess!=o.C001_ReturnQtyInProcess
                                  or i.C001_ReturnQtyDueOut!=o.C001_ReturnQtyDueOut))
              and o.C001_AsOfDate=@AsOfDate
  • TXXX_InventoryFull'. Scan count 41, logical reads 692
  • T001_InventoryDetail'. Scan count 65, logical reads 17477
  • Worktable'. Scan count 0, logical reads 0


It is generally said that one should avoid doing coordinated subqueries in the predicate, as these tend to force nested loop joins. When querying large datasets, especially where one's trying to discover a difference between the sets, it's important to allow the query optimizer to choose dynamically between hash, merge and nested loop algorhithms, which may not be possible if the query is structured using a coordinated subquery. Better to create these as derived tables in the FROM clause.


I have found similar issues using the EXISTS statement on a SQL 08 R2 server, where the exact same statement runs fine on SQL 08 and SQL 05.

I found that changing something like

WHILE EXISTS(SELECT * FROM X)

Would be super slow, but:

WHILE ISNULL((SELECT TOP 1 ID FROM X), 0) <> 0

Runs perfectly fast again.

To me, it seems like an R2 issue...


I would guess that the plan you get is quite different when you use the join. Perhaps the imbalance in the number of rows (very large outer table, smaller inner table) is giving the optimizer fits, but it can probably eliminate rows much easier with the join (you'll probably see additional loop operators with the worse query). Tough to really guess without seeing the plans or being able to reproduce, but you should always aim at eliminating the most rows as early in the plan as possible. Pulling back millions of rows through several operators / subqueries only to eliminate most of them later in the plan is almost certainly going to yield worse performance.

0

精彩评论

暂无评论...
验证码 换一张
取 消