I'm using Django ORM and postgresql.
ORM creates a query:
SELECT
(date_part('month', stat_date)) AS "stat_date",
"direct_keywordstat"."banner_id",
SUM("direct_keywordstat"."total") AS "total",
SUM("direct_keywordstat"."clicks") AS "clicks",
SUM("direct_keywordstat"."shows") AS "shows"
FROM "direct_keywordstat"
LEFT OUTER JOIN "direct_banner" ON ("direct_keywordstat"."banner_id" = "direct_banner"."banner_ptr_id")
LEFT OUTER JOIN "platforms_banner" ON ("direct_banner"."banner_ptr_id" = "platforms_banner"."id")
WHERE (
"direct_keywordstat".stat_date BETWEEN E'2009-08-25' AND E'2010-08-25' AND
"direct_keywordstat"."keyword_id" IN (
SELECT U0."id"
FROM "direct_keyword" U0
INNER JOIN "direct_banner" U1 ON (U0."banner_id" = U1."banner_ptr_id")
INNER JOIN "platforms_banner" U2 ON (U1."banner_ptr_id" = U2."id")
INNER JOIN "platforms_campaign" U3 ON (U2."campaign_id" = U3."id")
INNER JOIN "direct_campaign" U4 ON (U3."id" = U4."campaign_ptr_id")
WHERE (
U0."deleted" = E'False' AND
U0."low_ctr" = E'False' AND
U4."status_active" = E'True' AND
U0."banner_id" IN (
SELECT U0."banner_ptr_id"
FROM "direct_banner" U0
INNER JOIN "platforms_banner" U1
ON (U0."banner_ptr_id" = U1."id")
WHERE (
U0."status_show" = E'True' AND
U1."campaign_id" = E'174' )
)
)
)
)
GROUP BY
"direct_keywordstat"."banner_id",
(date_part('month', stat_date)),
"platforms_banner"."title", date_trunc('month', stat_date)
ORDER BY "platforms_banner"."title" ASC, "stat_date" ASC
Problem is, direct_keywordstat contains 3mln+ records, so the query executes in ~15 seconds.
I've tried creating indexes like
CREATE INDEX direct_keywordstat_stat_date on direct_keywordstat using btree(stat_date);
But EXPLAIN ANALYZE show that index is not used.
Table schema:
\d direct_keywordstat
Table "public.direct_keywordstat"
Column | Type | Modifiers
-------------+------------------------+-----------------------------------------------------------------
id | integer | not null default nextval('direct_keywordstat_id_seq'::regclass)
keyword_id | integer | not null
banner_id | integer | not null
campaign_id | integer | not null
stat_date | date | not null
region_id | integer | not null
place_type | character varying(30) |
place_name | character varying(100) |
clicks | integer | not nul开发者_如何学Pythonl default 0
shows | integer | not null default 0
total | numeric(19,6) | not null
How can i create useful index?
Or, maybe, there's a chance to optimize this query other way?
Thing is, if WHERE looks like
"direct_keywordstat".clicks BETWEEN 10 AND 3000000
query executes in 0.8 seconds.
Do you have indexes on these columns:
direct_banner.banner_ptr_id
direct_keywordstat.banner_id
direct_keywordstat.stat_date
Both columns in direct_keywordstat could be combined in a single index, just check
This is also a problem:
Sort Method: external merge Disk: 20600kB
Check your settings for work_mem, you need at least 20MB for this query.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=727967.61..847401.71 rows=2514402 width=67) (actual time=22010.522..23408.262 rows=5 loops=1)
-> Sort (cost=727967.61..734253.62 rows=2514402 width=67) (actual time=21742.365..23134.748 rows=198978 loops=1)
Sort Key: platforms_banner.title, (date_part('month'::text, (direct_keywordstat.stat_date)::timestamp without time zone)), direct_keywordstat.banner_id, (date_trunc('month'::text, (direct_keywordstat.stat_date)::timestamp with time zone))
Sort Method: external merge Disk: 20600kB
-> Hash Join (cost=1034.02..164165.25 rows=2514402 width=67) (actual time=5159.538..14942.441 rows=198978 loops=1)
Hash Cond: (direct_keywordstat.keyword_id = u0.id)
-> Hash Left Join (cost=365.78..117471.99 rows=2514402 width=71) (actual time=26.672..13101.294 rows=2523151 loops=1)
Hash Cond: (direct_keywordstat.banner_id = direct_banner.banner_ptr_id)
-> Seq Scan on direct_keywordstat (cost=0.00..76247.17 rows=2514402 width=25) (actual time=8.892..9386.010 rows=2523151 loops=1)
Filter: ((stat_date >= '2009-08-25'::date) AND (stat_date <= '2010-08-25'::date))
-> Hash (cost=324.86..324.86 rows=3274 width=50) (actual time=17.754..17.754 rows=2851 loops=1)
-> Hash Left Join (cost=209.15..324.86 rows=3274 width=50) (actual time=10.845..15.385 rows=2851 loops=1)
Hash Cond: (direct_banner.banner_ptr_id = platforms_banner.id)
-> Seq Scan on direct_banner (cost=0.00..66.74 rows=3274 width=4) (actual time=0.004..1.196 rows=2851 loops=1)
-> Hash (cost=173.51..173.51 rows=2851 width=50) (actual time=10.683..10.683 rows=2851 loops=1)
-> Seq Scan on platforms_banner (cost=0.00..173.51 rows=2851 width=50) (actual time=0.004..3.576 rows=2851 loops=1)
-> Hash (cost=641.44..641.44 rows=2144 width=4) (actual time=30.420..30.420 rows=106 loops=1)
-> HashAggregate (cost=620.00..641.44 rows=2144 width=4) (actual time=30.162..30.288 rows=106 loops=1)
-> Hash Join (cost=407.17..614.64 rows=2144 width=4) (actual time=16.152..30.031 rows=106 loops=1)
Hash Cond: (u0.banner_id = u1.banner_ptr_id)
-> Nested Loop (cost=76.80..238.50 rows=6488 width=16) (actual time=8.670..22.343 rows=106 loops=1)
-> HashAggregate (cost=76.80..76.87 rows=7 width=8) (actual time=0.045..0.047 rows=1 loops=1)
-> Nested Loop (cost=0.00..76.79 rows=7 width=8) (actual time=0.033..0.036 rows=1 loops=1)
-> Index Scan using platforms_banner_campaign_id on platforms_banner u1 (cost=0.00..22.82 rows=7 width=4) (actual time=0.019..0.020 rows=1 loops=1)
Index Cond: (campaign_id = 174)
-> Index Scan using direct_banner_pkey on direct_banner u0 (cost=0.00..7.70 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)
Index Cond: (u0.banner_ptr_id = u1.id)
Filter: u0.status_show
-> Index Scan using direct_keyword_banner_id on direct_keyword u0 (cost=0.00..23.03 rows=5 width=8) (actual time=8.620..22.127 rows=106 loops=1)
Index Cond: (u0.banner_id = u0.banner_ptr_id)
Filter: ((NOT u0.deleted) AND (NOT u0.low_ctr))
-> Hash (cost=316.84..316.84 rows=1082 width=8) (actual time=7.458..7.458 rows=403 loops=1)
-> Hash Join (cost=227.00..316.84 rows=1082 width=8) (actual time=3.584..7.149 rows=403 loops=1)
Hash Cond: (u1.banner_ptr_id = u2.id)
-> Seq Scan on direct_banner u1 (cost=0.00..66.74 rows=3274 width=4) (actual time=0.002..1.570 rows=2851 loops=1)
-> Hash (cost=213.48..213.48 rows=1082 width=4) (actual time=3.521..3.521 rows=403 loops=1)
-> Hash Join (cost=23.88..213.48 rows=1082 width=4) (actual time=0.715..3.268 rows=403 loops=1)
Hash Cond: (u2.campaign_id = u3.id)
-> Seq Scan on platforms_banner u2 (cost=0.00..173.51 rows=2851 width=8) (actual time=0.001..1.272 rows=2851 loops=1)
-> Hash (cost=22.95..22.95 rows=74 width=8) (actual time=0.345..0.345 rows=37 loops=1)
-> Hash Join (cost=11.84..22.95 rows=74 width=8) (actual time=0.133..0.320 rows=37 loops=1)
Hash Cond: (u3.id = u4.campaign_ptr_id)
-> Seq Scan on platforms_campaign u3 (cost=0.00..8.91 rows=391 width=4) (actual time=0.006..0.098 rows=196 loops=1)
-> Hash (cost=10.91..10.91 rows=74 width=4) (actual time=0.117..0.117 rows=37 loops=1)
-> Seq Scan on direct_campaign u4 (cost=0.00..10.91 rows=74 width=4) (actual time=0.004..0.097 rows=37 loops=1)
Filter: status_active
Total runtime: 23436.715 ms (47 rows)
Here it is
精彩评论