开发者

optimize Query in PostgreSQL

开发者 https://www.devze.com 2023-02-25 08:00 出处:网络
SELECT count(*) FROM contacts_lists JOIN plain_cont开发者_Go百科acts ON contacts_lists.contact_id = plain_contacts.contact_id
SELECT count(*) 
FROM contacts_lists 
     JOIN plain_cont开发者_Go百科acts 
          ON contacts_lists.contact_id = plain_contacts.contact_id 
     JOIN contacts 
          ON contacts.id = plain_contacts.contact_id 
WHERE plain_contacts.has_email 
      AND NOT contacts.email_bad 
      AND NOT contacts.email_unsub 
      AND contacts_lists.list_id =67339

how can i optimize this query.. could you please explain...


Reformatting your query plan for clarity:

QUERY PLAN Aggregate (cost=126377.96..126377.97 rows=1 width=0)
  -> Hash Join (cost=6014.51..126225.38 rows=61033 width=0)
     Hash Cond: (contacts_lists.contact_id = plain_contacts.contact_id)
    -> Hash Join (cost=3067.30..121828.63 rows=61033 width=8)
       Hash Cond: (contacts_lists.contact_id = contacts.id)
      -> Index Scan using index_contacts_lists_on_list_id_and_contact_id
         on contacts_lists (cost=0.00..116909.97 rows=61033 width=4)
         Index Cond: (list_id = 66996)
         -> Hash (cost=1721.41..1721.41 rows=84551 width=4)
         -> Seq Scan on contacts (cost=0.00..1721.41 rows=84551 width=4)
            Filter: ((NOT email_bad) AND (NOT email_unsub))
            -> Hash (cost=2474.97..2474.97 rows=37779 width=4)
            -> Seq Scan on plain_contacts (cost=0.00..2474.97 rows=37779 width=4)
               Filter: has_email

Two partial indexes might eliminate seq scans depending on your data distribution:

-- if many contacts have bad emails or are unsubscribed:
CREATE INDEX contacts_valid_email_idx ON contacts (id)
WHERE (NOT email_bad AND NOT email_unsub);

-- if many contacts have no email:
CREATE INDEX plain_contacts_valid_email_idx ON plain_contacts (id)
WHERE (has_email);

You might be missing an index on a foreign key:

CREATE INDEX plain_contacts_contact_id_idx ON plain_contacts (contact_id);

Last but not least if you've never analyzed your data, you need to run:

VACUUM ANALYZE;

If it's still slow once all that is done, there isn't much you can do short of merging your plain_contacts and your contacts tables: getting the above query plan in spite of the above indexes means most/all of your subscribers are subscribed to that particular list -- in which case the above query plan is the fastest you'll get.


This is already a very simple query that the database will run in the most efficient way providing that statistics are up to date

So in terms of the query itself there's not much to do.

In terms of database administration you can add indexes - there should be indexes in the database for all the join conditions and also for the most selective part of the where clause (list_id, contact_id as FK in plain_contacts and contacts_lists). This is the most significant opportunity to improve performance of this query (orders of magnitude). Still as SpliFF notes, you probably already have those indexes, so check.

Also, postgres has good explain command that you should learn and use. It will help with optimizing queries.


Since you only want to inlude rows that has some flags set in the joined tables, I would move that statements into the join clause:

SELECT count(*) 
FROM contacts_lists 
     JOIN plain_contacts 
          ON contacts_lists.contact_id = plain_contacts.contact_id 
          AND NOT plain_contacts.has_email
     JOIN contacts 
          ON contacts.id = plain_contacts.contact_id 
          AND NOT contacts.email_unsub 
          AND NOT contacts.email_bad 
WHERE contacts_lists.list_id =67339

I'm not sure if this would make a great impact on performance, but worth a try. You should probably have indexes on the joined tables as well for optimal performance, like this:

plain_contacts: contact_id, has_email
contacts: id, email_unsub, email_bad


Have you run ANALYZE on the database recently? Do the row counts in the EXPLAIN plan look like they make sense? (Looks like you ran only EXPLAIN. EXPLAIN ANALYZE gives both estimated and actual timings.)


You can use SELECT count(1) ... but other than that I'd say it looks fine. You could always cache some parts of the query using views or put indexes on contact_id and list_id if you're really struggling (I assume you have one on id already).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号