Compare these 2 queries. Is it faster to put the filter on the join criteria or in the WHERE
clause. I have always felt that it is faster on the join criteria because it reduces the result set at the soonest possible moment, but I don't know for sure.
I'm going to build some tests to see, but I also wanted to get opinions on which would is clearer to read as well.
Query 1
SELECT *
FROM TableA a
INNER JOIN TableXRef x
ON a.ID = x.TableAID
INNER JOIN TableB b
ON x.TableBID = b.ID
WHERE a.ID = 1 /* <-- Filter here? */
Query 2
SELECT *
FROM TableA a
INNER JOIN TableXRef x
ON a.ID = x.TableAID
AND a.ID = 1 /* <-- Or filter here? */
INNER JOIN TableB b
ON x.TableBID = b.ID
EDIT
I ran some tests and the results show that it is actually very close, but the WHERE
clause is actually slightly faster! =)
I absolutely agree that it makes more sense to apply the filter on the WHERE
clause, I was just curious as to the performance implications.
ELAPSED TIME WHERE CRITERIA: 143016 ms
ELAPSED TIME JOIN CRITERIA: 143256 msTEST
SET NOCOUNT ON;
DECLARE @num INT,
@iter INT
SELECT @num = 1000, -- Number of records in TableA and TableB, the cross table is populated with a CROSS JOIN from A to B
@iter = 1000 -- Number of select iterations to perform
DECLARE @a TABLE (
id INT
)
DECLARE @b TABLE (
id INT
)
DECLARE @x TABLE (
aid INT,
bid INT
)
DECLARE @num_curr INT
SELECT @num_curr = 1
WHILE (@num_curr <= @num)
BEGIN
INSERT @a (id) SELECT @num_curr
INSERT @b (id) SELECT @num_curr
SELECT @num_curr = @num_curr + 1
END
INSERT @x (aid, bid)
SELECT a.id,
b.id
FROM @a a
CROSS JOIN @b b
/*
TEST
*/
DECLARE @begin_where DATETIME,
@end_where DATETIME,
@count_where INT,
@begin_join DATETIME,
@end_join DATETIME,
@count_join INT,
@curr INT,
@aid INT
DECLARE @temp TABLE (
curr INT,
aid INT,
bid INT
)
DELETE FROM @temp
SELECT @curr = 0,
@aid = 50
SELECT @begin_where = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
INSERT @temp (curr, aid, bid)
SELECT @curr,
aid,
bid
FROM @a a
INNER JOIN @x x
ON a.id = x.aid
INNER JOIN @b b
ON x.bid = b.id
WHERE a.id = @aid
SELECT @curr = @curr + 1
END
SELECT @end_where = CURRENT_TIMESTAMP
SELECT @count_where = COUNT(1) FROM @temp
DELETE FROM @temp
SELECT @curr = 0
SELECT @begin_join = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
INSERT @temp (curr, aid, bid)
SELECT @curr,
aid,
bid
FROM @a a
INNER JOIN @x x
ON a.id = x.aid
AND a.id = @aid
INNER JOIN @b b
开发者_Go百科ON x.bid = b.id
SELECT @curr = @curr + 1
END
SELECT @end_join = CURRENT_TIMESTAMP
SELECT @count_join = COUNT(1) FROM @temp
DELETE FROM @temp
SELECT @count_where AS count_where,
@count_join AS count_join,
DATEDIFF(millisecond, @begin_where, @end_where) AS elapsed_where,
DATEDIFF(millisecond, @begin_join, @end_join) AS elapsed_join
Performance-wise, they are the same (and produce the same plans)
Logically, you should make the operation that still has sense if you replace INNER JOIN
with a LEFT JOIN
.
In your very case this will look like this:
SELECT *
FROM TableA a
LEFT JOIN
TableXRef x
ON x.TableAID = a.ID
AND a.ID = 1
LEFT JOIN
TableB b
ON x.TableBID = b.ID
or this:
SELECT *
FROM TableA a
LEFT JOIN
TableXRef x
ON x.TableAID = a.ID
LEFT JOIN
TableB b
ON b.id = x.TableBID
WHERE a.id = 1
The former query will not return any actual matches for a.id
other than 1
, so the latter syntax (with WHERE
) is logically more consistent.
For inner joins it doesn't matter where you put your criteria. The SQL compiler will transform both into an execution plan in which the filtering occurs below the join (ie. as if the filter expressions appears is in the join condition).
Outer joins are a different matter, since the place of the filter changes the semantics of the query.
As far as the two methods go.
- JOIN/ON is for joining tables
- WHERE is for filtering results
Whilst you can use them differently it always seems like a smell to me.
Deal with performance when it is a problem. Then you can look into such "optimisations".
In postgresql they are the same. We know this because if you do explain analyze
on each of the queries, the plan comes out to be the same. Take this example:
# explain analyze select e.* from event e join result r on e.id = r.event_id and r.team_2_score=24;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Hash Join (cost=27.09..38.22 rows=7 width=899) (actual time=0.045..0.047 rows=1 loops=1)
Hash Cond: (e.id = r.event_id)
-> Seq Scan on event e (cost=0.00..10.80 rows=80 width=899) (actual time=0.009..0.010 rows=2 loops=1)
-> Hash (cost=27.00..27.00 rows=7 width=8) (actual time=0.017..0.017 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on result r (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.008 rows=1 loops=1)
Filter: (team_2_score = 24)
Rows Removed by Filter: 1
Planning time: 0.182 ms
Execution time: 0.101 ms
(10 rows)
# explain analyze select e.* from event e join result r on e.id = r.event_id where r.team_2_score=24;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Hash Join (cost=27.09..38.22 rows=7 width=899) (actual time=0.027..0.029 rows=1 loops=1)
Hash Cond: (e.id = r.event_id)
-> Seq Scan on event e (cost=0.00..10.80 rows=80 width=899) (actual time=0.010..0.011 rows=2 loops=1)
-> Hash (cost=27.00..27.00 rows=7 width=8) (actual time=0.010..0.010 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on result r (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.007 rows=1 loops=1)
Filter: (team_2_score = 24)
Rows Removed by Filter: 1
Planning time: 0.140 ms
Execution time: 0.058 ms
(10 rows)
They both have the same min and max cost as well as the same query plan. Also, notice that even in the top query the team_score_2 gets applied as a 'Filter'.
With any query optimizer worh a cent.... they are identical.
Rule #0: Run some benchmarks and see! The only way to really tell which will be faster is to try it. These types of benchmarks are very easy to perform using the SQL profiler.
Also, examine the execution plan for the query written with a JOIN and with a WHERE clause to see what differences stand out.
Finally, as others have said, these two should be treated identically by any decent optimizer, including the one built into SQL Server.
It is really unlikely that the placement of this join will be the deciding factor for performance. I am not intimately familiar with the execution planning for tsql, but it's likely that they will be optimized automatically to similar plans.
Is it faster? Try it and see.
Which is easier to read? The first to me looks more "correct", as the moved condition is nothing really to do with the join.
I guess that the first, because it makes a more specific filter over the data. But you should see the execution plan, as with any optimization, because it can be very different deppending on size of data, server hardware, etc.
精彩评论