开发者

Need help understanding the SQL explanation of a JOIN query versus a query with subselects

开发者 https://www.devze.com 2023-02-04 01:26 出处:网络
I posted a previous question here asking about what was better, JOIN queries or queries using subselects. Link: Queries within queries: Is there a better way?

I posted a previous question here asking about what was better, JOIN queries or queries using subselects. Link: Queries within queries: Is there a better way?

This is an extension to that question. Can somebody explain to me why I'm seeing what I'm seeing here?

Query (Subselects):

SELECT article_seq, title, synopsis, body, lastmodified_date, (SELECT type_id FROM types WHERE kbarticles.type = type_seq), status, scope, images, archived, author, owner, (SELECT owner_description FROM owners WHERE kbarticles.owner = owner_seq),  (SELECT review_date FROM kbreview WHERE kbarticles.article_seq = article_seq) FROM kbarticles WHERE article_seq = $1

Explain Analyze (Subselects)

QUERY PLAN

Index Scan using article_seq_pkey on kbarticles  (cost=0.00..32.24 rows=1 width=1241) (actual time=1.421..1.426 rows=1 loops=1)

  Index Cond: (article_seq = 1511)

  SubPlan

    ->  Seq Scan on kbreview  (cost=0.00..14.54 rows=1 width=8) (actual time=0.243..1.158 rows=1 loops=1)

          Filter: ($2 = article_seq)

    ->  Seq Scan on owners  (cost=0.00..1.16 rows=1 width=24) (actual time=0.073..0.078 rows=1 loops=1)

          Filter: ($1 = owner_seq)

    ->  Index Scan using types_type_seq_key on types  (cost=0.00..8.27 rows=1 width=24) (actual time=0.044..0.050 rows=1 loops=1)

          Index Cond: ($0 = type_seq)

Total runtime: 2.051 ms

Query (JOINs)

SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status, k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date FROM kbarticles k JOIN types t ON k.type = t.type_seq JOIN owners o ON k.owner = o.owner_seq JOIN kbreview r ON k.article_seq = r.article_seq WHERE k.article_seq = $1

Explain Analyze (JOINs)

QUERY PLAN

Nested Loop  (cost=0.00..32.39 rows=1 width=1293) (actual time=0.532..1.467 rows=1 loops=1)

  Join Filter: (k.owner = o.owner_seq)

  ->  Nested Loop  (cost=0.00..31.10 rows=1 width=1269) (actual time=0.419..1.345 rows=1 loops=1)

        ->  Nested Loop  (cost=0.00..22.82 rows=1 width=1249) (actual time=0.361..1.277 rows=1 loops=1)

              ->  Index Scan using article_seq_pkey on kbarticles k  (cost=0.00..8.27 rows=1 width=1241) (actual time=0.065..0.071 rows=1 loops=1)

                    Index Cond: (article_seq = 1511)

              ->  Seq Scan on kbreview r  (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)

                    Filter: (r.article_seq = 1511)

       开发者_如何学JAVA ->  Index Scan using types_type_seq_key on types t  (cost=0.00..8.27 rows=1 width=28) (actual time=0.048..0.055 rows=1 loops=1)

              Index Cond: (t.type_seq = k.type)

  ->  Seq Scan on owners o  (cost=0.00..1.13 rows=13 width=28) (actual time=0.022..0.038 rows=13 loops=1)

Total runtime: 2.256 ms

Based on the answers given (and accepted) in my previous question, JOINs should prove to have better results. However, in all my tests, I'm seeing JOINs to have worse results by a few milliseconds. It also seems like the JOINs are riddled with nested loops. All the tables I'm JOINing are indexed.

Am I doing something that I should be doing differently? Is there something I'm missing?


These queries are logically different.

The first one:

SELECT  article_seq, title, synopsis, body, lastmodified_date,
        (
        SELECT  type_id
        FROM    types
        WHERE   kbarticles.type = type_seq
        ),
        status, scope, images, archived, author, owner,
        (
        SELECT  owner_description
        FROM    owners
        WHERE   kbarticles.owner = owner_seq
        ),
        (
        SELECT  review_date
        FROM    kbreview
        WHERE   kbarticles.article_seq = article_seq
        )
FROM    kbarticles
WHERE   article_seq = $1

The second one:

SELECT  k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status,
        k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date
FROM    kbarticles k
JOIN    types t
ON      k.type = t.type_seq
JOIN    owners o
ON      k.owner = o.owner_seq
JOIN    kbreview r
ON      k.article_seq = r.article_seq
WHERE   k.article_seq = $1

If there is more than one record in types, owners or kbreview, the first query will fail while the second one will return duplicates from kbarticles.

If there is no types, owners or kbreviews for a kbarticle, the first query will return a NULL in appropriate field, while the second one will just omit that record.

If the *_seq fields seem to be the PRIMARY KEY fields, there will never be duplicates and the query will never fail; in the same way if kbarticles is constrained with FOREIGN KEY references to types, owners or kbreview, there can be no missing rows.

However, JOIN operators give the optimizer more place: it can make any table leading and use more advanced JOIN techniques like HASH JOIN or MERGE JOIN which are not available if you are using subqueries.


Is this table column indexed? r.article_seq

-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)

This is where most time is spend.


Given that both plans are doing the same table scans, just arranged in a different way, I'd say there's no significant difference between the two. A "nested loop" where the lower arm produces a single row is pretty much the same as a single-row subselect.

Joins are more general, since using scalar subselects won't extend to getting two columns from any of those auxiliary tables, for example.

0

精彩评论

暂无评论...
验证码 换一张
取 消