We met a strange situation on SQL Server 2008 (SP1) - 10.0.2531.0 (X64) - Win2008 SP2 (X64).
Here is a one heavy query:
select t1.id, t2.id
from t1, t2
where
t1.id = t2.ext_id
and isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')
.... and about 10 mor开发者_如何学编程e comparisons with Isnull
UPD: All columns in comparison (except IDs) are varchar
(~30...200)
These query on rather big Dev server run ~5 hours - this is slow, but what we can do?
And while we investigated possible ways of optimisation - we found, that changing "isnull" to "coalesce" in query above gives double performance gain - and query now run for ~2 hours
UPD: When we remove all ISNULL
checks and use just t1.vchCol1 = t2.vchCol1
the query finishes after 40mins.
Question is: Is this known behavior and we should avoid using IsNull everywhere?
I wonder if you'd see an improvement by splitting the cases out explicitly:
...
AND ((t1.vchCol1 = t2.vchCol1) OR (t1.vchCol1 IS NULL AND t2.vchCol1 IS NULL))
AND ((t1.vchCol2 = t2.vchCol2) OR (t1.vchCol2 IS NULL AND t2.vchCol2 IS NULL))
...
Most of the articles you'll find on this subject seem to contradict this. ISNULL
is (marginally) faster than COALESCE
.
Differences between ISNULL
and COALESCE
COALESCE
basically translates toCASE
expression andISNULL
is a built-in implemented in the database engine.
...
This will make a performance difference and queries withCOALESCE
often fare worse here.
ISNULL
vs. COALESCE
I ran these tests several times on a few different servers, and
ISNULL
appears to pretty consistently out-performCOALESCE
by an average of 10 or 12 percent. But that's the difference between 6 seconds and 5.3 seconds (the approximate average runtimes per test on my servers), over the course of a million exections. Hardly worth the functionality and standards compliance sacrifice, at least in the scenarios I use these functions for.
COALESCE
vs ISNULL
vs IS NULL OR
the best performer is
IS NULL OR
case, while the difference between all 3 of them is minor.
You may want to consider adding a computed column to each table that holds a checksum value. Then, create an index on the ID column and the checksum value, and finally use the checksum value in the join. Something like this:
Alter Table T1 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)
Alter Table T2 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)
Create NonClustered index idx_T1_Checksum On T1(id, CheckSumId)
Create NonClustered index idx_T2_Checksum On T2(ext_id, CheckSumId)
Then your query would become...
select t1.id, t2.id
from t1 Inner Join t2
On t1.id = t2.ext_id
And T1.CheckSumId = T2.CheckSumId
where isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')
This will, of course, use extra index space, but it's simply 2 integers which should be very efficient. There will also be a performance penalty for each insert, update and delete because another index needs to be maintained. However, I suspect this will have a large impact on performance.
I realize this is a year later, but...
For this kind of column-by-column comparison, you might consider using EXCEPT. Also, EXCEPT treats NULL like another value instead of "It could be anything!", as I like to call it.
"When you compare rows for determining distinct values, two NULL values are considered equal." --from http://msdn.microsoft.com/en-us/library/ms188055.aspx
精彩评论