I have three tables:
SmallTable
(id int, flag1 bit, flag2 bit)
JoinTable
(SmallTableID int, BigTableID int)
BigTable
(id int, text1 nvarchar(100), otherstuff...)
SmallTable
has, at most, a few dozen records. BigTable
has a few million, and is actually a view that UNIONS a table in this database with a table in another database on the same server.
Here's the join logic:
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
Average joined size is a few thousand results. Everything shown is indexed.
For most SmallTable
records, flag1
and flag2
are set to 1
, so there's really no need to even access the index on BigTable.text1, but SQL Server does anyway, leading to a costly Indexed Scan and Nested Loop.
Is there a better way to hint to SQL Server that, if flag1
and flag2
are both set to 1
, it shouldn't even bother looking at text1
?
Actually, if I can avoid the join to BigTable completely in these cases (JoinTable is managed, so this wouldn't create an issue), that would make this key quer开发者_运维问答y even faster.
SQL Boolean evaluation does NOT guarantee operator short-circuit. See On SQL Server boolean operator short-circuit for a clear example showing how assuming operator short circuit can lead to correctness issues and run-time errors.
On the other hand the very example in my link shows what does work for SQL Server: providing an access path that SQL can use. So, as with all SQL performance problems and questions, the real problem is not in the way the SQL text is expressed, but in the design of your storage. Ie. what indexes has the query optimizer at its disposal to satisfy your query?
I don't believe SQL Server will short-circuit conditions like that unfortunately.
SO I'd suggest doing 2 queries and UNION them together. First query with s.flag1=1 and s.flag2=1 WHERE conditions, and the second query doing the join on to BigTable with the s.flag1<>1 a s.flag2<>1 conditions.
This article on the matter is worth a read, and includes the bottom line:
...SQL Server does not do short-circuiting like it is done in other programming languages and there's nothing you can do to force it to.
Update:
This article is also an interesting read and contains some good links on this topic, including a technet chat with the development manager for the SQL Server Query Processor team which briefly mentions that the optimizer does allow short-circuit evaluation. The overall impression I get from various articles is "yes, the optimizer can spot the opportunity to short circuit but you shouldn't rely on it and you can't force it". Hence, I think the UNION approach may be your best bet. If it's not coming up with a plan that takes advantage of an opportunity to short cut, that would be down to the cost-based optimizer thinking it's found a reasonable plan that does not do it (this would be down to indexes, statistics etc).
It's not elegant, but it should work...
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1 = 1 and s.flag2 = 1) OR
(
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
)
SQL Server usually grabs the subquery hint (though it's free to discard it):
SELECT *
FROM (
SELECT * FROM SmallTable where flag1 <> 1 or flag2 <> 1
) s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
...
No idea if this will be faster without test data... but it sounds like it might
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1) AND (s.flag2=1)
UNION ALL
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=0 AND b.text1 <> 'value1')
Please let me know what happens
Also, you might be able to speed this up by just returning just a unique id for this query and then using the result of that to get all the rest of the data.
edit
something like this?
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1) AND (s.flag2=1)
UNION ALL
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE EXISTS
(SELECT 1 from BigTable b
WHERE
(s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=0 AND b.text1 <> 'value1')
)
Hope this works - careful of shortcut logic in case
statements around aggregates
but...
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE 1=case when (s.flag1 = 1 and s.flag2 = 1) then 1
when (
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
) then 1
else 0 end
精彩评论