开发者

SQL Server query performance - removing need for Hash Match (Inner Join)

开发者 https://www.devze.com 2023-03-31 13:14 出处:网络
I have the following query, which is doing very little and is an example of the kind of joins I am doing throughout the system.

I have the following query, which is doing very little and is an example of the kind of joins I am doing throughout the system.

select t1.PrimaryKeyId, t1.AdditionalColumnId
from TableOne t1
    join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId
    join TableThree t3 on t1.PrimaryKeyId = t3.ForeignKeyId
    join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId
    join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId
where 
    t1.StatusId = 1
    and t开发者_如何学C5.TypeId = 68

There are indexes on all the join columns, however the performance is not great. Inspecting the query plan reveals a lot of Hash Match (Inner Joins) when really I want to see Nested Loop joins.

The number of records in each table is as follows:

select count(*) from TableOne

= 64393

select count(*) from TableTwo

= 87245

select count(*) from TableThree

= 97141

select count(*) from TableFour

= 116480

select count(*) from TableFive

= 62

What is the best way in which to improve the performance of this type of query?


First thoughts:

  1. Change to EXISTS (changes equi-join to semi-join)
  2. You need to have indexes on t1.StatusId, t5.TypeId and INCLUDE t1.AdditionalColumnID

I wouldn't worry about your join method yet...

Personally, I've never used a JOIN hint. They only work for the data, indexes and statistics you have at that point in time. As these change, your JOIN hint limits the optimiser

select t1.PrimaryKeyId, t1.AdditionalColumnId
from
    TableOne t1
where 
    t1.Status = 1
    AND EXISTS (SELECT *
        FROM
          TableThree t3
          join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId
          join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId
        WHERE
          t1.PrimaryKeyId = t3.ForeignKeyId
          AND
          t5.TypeId = 68)
    AND EXISTS (SELECT *
        FROM
          TableTwo t2
        WHERE
          t1.ForeignKeyId = t2.PrimaryKeyId)

Index for tableOne.. one of

  • (Status, ForeignKeyId) INCLUDE (AdditionalColumnId)
  • (ForeignKeyId, Status) INCLUDE (AdditionalColumnId)

Index for tableFive... probably (typeID, PrimaryKeyId)

Edit: updated JOINS and EXISTS to match question fixes


SQL Server is pretty good at optimizing queries, but it's also conservative: it optimizes queries for the worst case. A loop join typically results in an index lookup and a bookmark lookup for for every row. Because loop joins cause dramatic degradation for large sets, SQL Server is hesitant to use them unless it's sure about the number of rows.

You can use the forceseek query hint to force an index lookup:

inner join TableTwo t2 with (FORCESEEK) on t1.ForeignKeyId = t2.PrimaryKeyId

Alternatively, you can force a loop join with the loop keyword:

inner LOOP join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId

Query hints limit SQL Server's freedom, so it can no longer adapt to changed circumstances. It's best practice to avoid query hints unless there is a business need that cannot be met without them.

0

精彩评论

暂无评论...
验证码 换一张
取 消