SQL Query execution shortcut OR logic?_问答_开发者

I have three tables:

SmallTable
   (id int, flag1 bit, flag2 bit)
JoinTable
   (SmallTableID int, BigTableID int)
BigTable
   (id int, text1 nvarchar(100), otherstuff...)

SmallTable has, at most, a few dozen records. BigTable has a few million, and is actually a view that UNIONS a table in this database with a table in another database on the same server.

Here's the join logic:

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
    AND (s.flag2=1 OR b.text1 <> 'value1')

Average joined size is a few thousand results. Everything shown is indexed.

For most SmallTable records, flag1 and flag2 are set to 1, so there's really no need to even access the index on BigTable.text1, but SQL Server does anyway, leading to a costly Indexed Scan and Nested Loop.

Is there a better way to hint to SQL Server that, if flag1 and flag2 are both set to 1, it shouldn't even bother looking at text1?

Actually, if I can avoid the join to BigTable completely in these cases (JoinTable is managed, so this wouldn't create an issue), that would make this key quer开发者_运维问答y even faster.

SQL Boolean evaluation does NOT guarantee operator short-circuit. See On SQL Server boolean operator short-circuit for a clear example showing how assuming operator short circuit can lead to correctness issues and run-time errors.

On the other hand the very example in my link shows what does work for SQL Server: providing an access path that SQL can use. So, as with all SQL performance problems and questions, the real problem is not in the way the SQL text is expressed, but in the design of your storage. Ie. what indexes has the query optimizer at its disposal to satisfy your query?

I don't believe SQL Server will short-circuit conditions like that unfortunately.

SO I'd suggest doing 2 queries and UNION them together. First query with s.flag1=1 and s.flag2=1 WHERE conditions, and the second query doing the join on to BigTable with the s.flag1<>1 a s.flag2<>1 conditions.

This article on the matter is worth a read, and includes the bottom line:

...SQL Server does not do short-circuiting like it is done in other programming languages and there's nothing you can do to force it to.

Update:
This article is also an interesting read and contains some good links on this topic, including a technet chat with the development manager for the SQL Server Query Processor team which briefly mentions that the optimizer does allow short-circuit evaluation. The overall impression I get from various articles is "yes, the optimizer can spot the opportunity to short circuit but you shouldn't rely on it and you can't force it". Hence, I think the UNION approach may be your best bet. If it's not coming up with a plan that takes advantage of an opportunity to short cut, that would be down to the cost-based optimizer thinking it's found a reasonable plan that does not do it (this would be down to indexes, statistics etc).

It's not elegant, but it should work...

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1 = 1 and s.flag2 = 1) OR 
    (
       (s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
       AND (s.flag2=1 OR b.text1 <> 'value1')
    )

SQL Server usually grabs the subquery hint (though it's free to discard it):

SELECT      * 
FROM        (
            SELECT * FROM SmallTable where flag1 <> 1 or flag2 <> 1
            ) s
INNER JOIN  JoinTable j ON j.SmallTableID = s.ID
...

No idea if this will be faster without test data... but it sounds like it might

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1=1) AND (s.flag2=1)
 UNION ALL
 SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
    AND (s.flag2=0 AND b.text1 <> 'value1')

Please let me know what happens

Also, you might be able to speed this up by just returning just a unique id for this query and then using the result of that to get all the rest of the data.

edit

something like this?

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1=1) AND (s.flag2=1)
 UNION ALL
 SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE EXISTS
    (SELECT 1 from BigTable b
     WHERE   
    (s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
    AND (s.flag2=0 AND b.text1 <> 'value1')
)

Hope this works - careful of shortcut logic in case statements around aggregates but...

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE 1=case when (s.flag1 = 1 and s.flag2 = 1) then 1
when (
       (s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
       AND (s.flag2=1 OR b.text1 <> 'value1')
    ) then 1
else 0 end