I posted a previous question here asking about what was better, JOIN
queries or queries using subselects. Link: Queries within queries: Is there a better way?
This is an extension to that question. Can somebody explain to me why I'm seeing what I'm seeing here?
Query (Subselects):
SELECT article_seq, title, synopsis, body, lastmodified_date, (SELECT type_id FROM types WHERE kbarticles.type = type_seq), status, scope, images, archived, author, owner, (SELECT owner_description FROM owners WHERE kbarticles.owner = owner_seq), (SELECT review_date FROM kbreview WHERE kbarticles.article_seq = article_seq) FROM kbarticles WHERE article_seq = $1
Explain Analyze (Subselects)
QUERY PLAN
Index Scan using article_seq_pkey on kbarticles (cost=0.00..32.24 rows=1 width=1241) (actual time=1.421..1.426 rows=1 loops=1)
Index Cond: (article_seq = 1511)
SubPlan
-> Seq Scan on kbreview (cost=0.00..14.54 rows=1 width=8) (actual time=0.243..1.158 rows=1 loops=1)
Filter: ($2 = article_seq)
-> Seq Scan on owners (cost=0.00..1.16 rows=1 width=24) (actual time=0.073..0.078 rows=1 loops=1)
Filter: ($1 = owner_seq)
-> Index Scan using types_type_seq_key on types (cost=0.00..8.27 rows=1 width=24) (actual time=0.044..0.050 rows=1 loops=1)
Index Cond: ($0 = type_seq)
Total runtime: 2.051 ms
Query (JOIN
s)
SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status, k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date FROM kbarticles k JOIN types t ON k.type = t.type_seq JOIN owners o ON k.owner = o.owner_seq JOIN kbreview r ON k.article_seq = r.article_seq WHERE k.article_seq = $1
Explain Analyze (JOIN
s)
QUERY PLAN
Nested Loop (cost=0.00..32.39 rows=1 width=1293) (actual time=0.532..1.467 rows=1 loops=1)
Join Filter: (k.owner = o.owner_seq)
-> Nested Loop (cost=0.00..31.10 rows=1 width=1269) (actual time=0.419..1.345 rows=1 loops=1)
-> Nested Loop (cost=0.00..22.82 rows=1 width=1249) (actual time=0.361..1.277 rows=1 loops=1)
-> Index Scan using article_seq_pkey on kbarticles k (cost=0.00..8.27 rows=1 width=1241) (actual time=0.065..0.071 rows=1 loops=1)
Index Cond: (article_seq = 1511)
-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)
Filter: (r.article_seq = 1511)
开发者_如何学JAVA -> Index Scan using types_type_seq_key on types t (cost=0.00..8.27 rows=1 width=28) (actual time=0.048..0.055 rows=1 loops=1)
Index Cond: (t.type_seq = k.type)
-> Seq Scan on owners o (cost=0.00..1.13 rows=13 width=28) (actual time=0.022..0.038 rows=13 loops=1)
Total runtime: 2.256 ms
Based on the answers given (and accepted) in my previous question, JOIN
s should prove to have better results. However, in all my tests, I'm seeing JOIN
s to have worse results by a few milliseconds. It also seems like the JOIN
s are riddled with nested loops. All the tables I'm JOIN
ing are indexed.
Am I doing something that I should be doing differently? Is there something I'm missing?
These queries are logically different.
The first one:
SELECT article_seq, title, synopsis, body, lastmodified_date,
(
SELECT type_id
FROM types
WHERE kbarticles.type = type_seq
),
status, scope, images, archived, author, owner,
(
SELECT owner_description
FROM owners
WHERE kbarticles.owner = owner_seq
),
(
SELECT review_date
FROM kbreview
WHERE kbarticles.article_seq = article_seq
)
FROM kbarticles
WHERE article_seq = $1
The second one:
SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status,
k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date
FROM kbarticles k
JOIN types t
ON k.type = t.type_seq
JOIN owners o
ON k.owner = o.owner_seq
JOIN kbreview r
ON k.article_seq = r.article_seq
WHERE k.article_seq = $1
If there is more than one record in types
, owners
or kbreview
, the first query will fail while the second one will return duplicates from kbarticles
.
If there is no types
, owners
or kbreviews
for a kbarticle
, the first query will return a NULL
in appropriate field, while the second one will just omit that record.
If the *_seq
fields seem to be the PRIMARY KEY
fields, there will never be duplicates and the query will never fail; in the same way if kbarticles
is constrained with FOREIGN KEY
references to types
, owners
or kbreview
, there can be no missing rows.
However, JOIN
operators give the optimizer more place: it can make any table leading and use more advanced JOIN
techniques like HASH JOIN
or MERGE JOIN
which are not available if you are using subqueries.
Is this table column indexed? r.article_seq
-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)
This is where most time is spend.
Given that both plans are doing the same table scans, just arranged in a different way, I'd say there's no significant difference between the two. A "nested loop" where the lower arm produces a single row is pretty much the same as a single-row subselect.
Joins are more general, since using scalar subselects won't extend to getting two columns from any of those auxiliary tables, for example.
精彩评论