开发者

Encapsulating Postgres query in view makes it extremely slow

开发者 https://www.devze.com 2023-01-12 03:08 出处:网络
I have a query that runs in about 5 seconds on Postgres 8.4. It selects data from a view joined to some other tables, but also uses the lag() window function, ie.

I have a query that runs in about 5 seconds on Postgres 8.4. It selects data from a view joined to some other tables, but also uses the lag() window function, ie.

SELECT *, lag(column1) OVER (PARTITION BY key1 ORDER BY ...), lag(...)
FROM view1 v
JOIN othertables USING (...)
WHERE ...

For convenience I created a new view that simply has

SELECT *, lag(column1) OVER (PARTITION BY key1 ORDER BY ...), lag(...)
FROM view1 v

and then SELECT from that, using all the other JOINs and filters as before. To my surprise this query doesn't complete in 12 minutes (I stopped it at that point). Obviously Postgres has cho开发者_如何学Gosen a different execution plan. How do I get it to not do that, ie. use the same plan as in the original query? I would have thought that a view shouldn't change the execution plan, but apparently it does.

Edit: what's more, I found that even if I copy the contents of the first view into the second it still doesn't return.

Edit 2: OK, I've simplified the query sufficiently to post the plans.

Using the view (this doesn't return in any reasonable time):

Subquery Scan sp  (cost=5415201.23..5892463.97 rows=88382 width=370)
  Filter: (((sp.ticker)::text ~~ 'Some Ticker'::text) AND (sp.price_date >= '2010-06-01'::date))
  ->  WindowAgg  (cost=5415201.23..5680347.20 rows=53029193 width=129)
        ->  Sort  (cost=5415201.23..5441715.83 rows=53029193 width=129)
              Sort Key: sp.stock_id, sp.price_date
              ->  Hash Join  (cost=847.87..1465139.61 rows=53029193 width=129)
                    Hash Cond: (sp.stock_id = s.stock_id)
                    ->  Seq Scan on stock_prices sp  (cost=0.00..1079829.20 rows=53029401 width=115)
                    ->  Hash  (cost=744.56..744.56 rows=29519 width=18)
                          ->  Seq Scan on stocks s  (cost=0.00..744.56 rows=29519 width=18)

Taking the window function out of the view and putting into the query itself (this returns instantly):

WindowAgg  (cost=34.91..34.95 rows=7 width=129)
  ->  Sort  (cost=34.91..34.92 rows=7 width=129)
        Sort Key: sp.stock_id, sp.price_date
        ->  Nested Loop  (cost=0.00..34.89 rows=7 width=129)
              ->  Index Scan using stocks_ticker_unique on stocks s  (cost=0.00..4.06 rows=1 width=18)
                    Index Cond: ((ticker)::text = 'Some Ticker'::text)
                    Filter: ((ticker)::text ~~ 'Some Ticker'::text)
              ->  Index Scan using stock_prices_id_date_idx on stock_prices sp  (cost=0.00..30.79 rows=14 width=115)
                    Index Cond: ((sp.stock_id = s.stock_id) AND (sp.price_date >= '2010-06-01'::date))

So it seems that in the slow case it's trying to apply the window function to all the data first and then filter it, which is probably the issue. I don't know why it's doing that, though.


Your difference between the two plans comes from joining with an aggregate. This prevents the use a nested loop plan. When you use the aggregate in your view, you put yourself in that unfavorable scenario.

This, for instance, will almost always lead to a merge or hash join plan on the two tables followed by a top-n sort:

select foo.*
from foo
join (select bar.* from bar group by bar.field) as bar on foo.field = bar.field
where ...
order by bar.field
limit 10;


Perhaps you could consider using a Common Table Expression (CTE) instead of a view. I can help making the query clearer in a similar way to using a view, but doesn't seem to affect the execution plan in the same way.

I had a similar problem in this question and using a CTE instead of a view made the execution plan much more efficient.

0

精彩评论

暂无评论...
验证码 换一张
取 消