Thread: BUG #9898: WindowAgg's causing horrific plans
The following bug has been logged on the website: Bug reference: 9898 Logged by: Jeff Trout Email address: jeff@jefftrout.com PostgreSQL version: 9.3.4 Operating system: OSX Mavericks Description: So I've encountered an issue present in PG 9.1, 9.2 and 9.3. Before I get going, I'll note that in http://www.postgresql.org/message-id/32583.1384469029@sss.pgh.pa.us Tom saying there is no optimization around windowagg's, which is the heart of my issue. I have some hopes perhaps there's some low hanging fruit... A query is involved with a WindowAgg can cause a very bad plan to be used instead of an optimial one. The original table I encountered this behavior on has about 9M rows. There are indexes on both datefiled and recieved (timestamps). I tried an index on datefiled, recieved but it did not do anything. First query: explain analyze select xx from mytable e where e.datefiled > current_day() - '30 days'::interval order by received desc limit 50; I get a perfectly sensible plan: Limit (cost=0.00..358.46 rows=50 width=12) (actual time=1.461..1.582 rows=50 loops=1) -> Index Scan Backward using mytable_received_idx on mytable e (cost=0.00..438183.50 rows=61121 width=12) (actual time=1.459..1.572 rows=50 loops=1) Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without time zone) Rows Removed by Filter: 27 Total runtime: 3.818 ms (5 rows) Now, lets throw a window agg in there - explain analyze select xx, count(*) over () from mytable e where e.datefiled > current_day() - '30 days'::interval order by received desc limit 50; (the count is to return total results, to display paging info) and we get this plan: Limit (cost=0.00..359.08 rows=50 width=12) (actual time=79088.099..79088.131 rows=50 loops=1) -> WindowAgg (cost=0.00..438947.51 rows=61121 width=12) (actual time=79088.095..79088.116 rows=50 loops=1) -> Index Scan Backward using mytable_received_idx on mytable e (cost=0.00..438183.50 rows=61121 width=12) (actual time=0.022..79060.177 rows=55949 loops=1) Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without time zone) Rows Removed by Filter: 9221863 Total runtime: 79104.066 ms which is pretty insane, would have made more sense to do an index scan on datefiled. Here's where it gets a touch more interesting, lets get rid of that order by received in the query. We now get: Limit (cost=0.00..49.25 rows=50 width=4) (actual time=93.780..93.802 rows=50 loops=1) -> WindowAgg (cost=0.00..60209.86 rows=61121 width=4) (actual time=93.778..93.792 rows=50 loops=1) -> Index Scan using mytable_datefiled_idx on mytable e (cost=0.00..59445.85 rows=61121 width=4) (actual time=1.283..77.869 rows=55949 loops=1) Index Cond: (datefiled > '2014-03-08 00:00:00'::timestamp without time zone) Total runtime: 95.090 ms (5 rows) so in context of the orig query, that backwards scan on received made sense, but not in the window agg version. one last thing, lets drop that index on recieved (this isn't really doable in production as many other things query against that field): Limit (cost=60858.62..60858.74 rows=50 width=12) (actual time=75.584..75.594 rows=50 loops=1) -> Sort (cost=60858.62..61011.42 rows=61121 width=12) (actual time=75.581..75.585 rows=50 loops=1) Sort Key: received Sort Method: top-N heapsort Memory: 28kB -> WindowAgg (cost=714.10..58828.22 rows=61121 width=12) (actual time=46.410..60.409 rows=55949 loops=1) -> Bitmap Heap Scan on mytable e (cost=714.10..58064.21 rows=61121 width=12) (actual time=6.460..29.735 rows=55949 loops=1) Recheck Cond: (datefiled > '2014-03-08 00:00:00'::timestamp without time zone) -> Bitmap Index Scan on mytable_datefiled_idx (cost=0.00..698.82 rows=61121 width=0) (actual time=6.145..6.145 rows=55949 loops=1) Index Cond: (datefiled > '2014-03-08 00:00:00'::timestamp without time zone) Total runtime: 75.674 ms (10 rows) which is perfectly sane. Not sure if there is anything the planner can do (I'm now going to go play the lets trick the planner with subqueries game)
On Mon, Apr 7, 2014 at 11:33 AM, <jeff@jefftrout.com> wrote: > The following bug has been logged on the website: > > Bug reference: 9898 > Logged by: Jeff Trout > Email address: jeff@jefftrout.com > PostgreSQL version: 9.3.4 > Operating system: OSX Mavericks > Description: > > So I've encountered an issue present in PG 9.1, 9.2 and 9.3. > > Before I get going, I'll note that in > http://www.postgresql.org/message-id/32583.1384469029@sss.pgh.pa.us > Tom saying there is no optimization around windowagg's, which is the heart > of > my issue. I have some hopes perhaps there's some low hanging fruit... > > A query is involved with a WindowAgg can cause a > very bad plan to be used instead of an optimial one. > > The original table I encountered this behavior on has about 9M rows. > There are indexes on both datefiled and recieved (timestamps). I tried an > index > on datefiled, recieved but it did not do anything. > > First query: > explain analyze > select xx > from mytable e > where > e.datefiled > current_day() - '30 days'::interval > order by received desc > limit 50; > > I get a perfectly sensible plan: > > Limit (cost=0.00..358.46 rows=50 width=12) (actual time=1.461..1.582 > rows=50 loops=1) > -> Index Scan Backward using mytable_received_idx on mytable e > (cost=0.00..438183.50 rows=61121 width=12) (actual time=1.459..1.572 > rows=50 > loops=1) > Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without time > zone) > Rows Removed by Filter: 27 > Total runtime: 3.818 ms > (5 rows) > > Now, lets throw a window agg in there - > explain analyze > select xx, count(*) over () > from mytable e > where > e.datefiled > current_day() - '30 days'::interval > order by received desc > limit 50; > > (the count is to return total results, to display paging info) > > and we get this plan: > > Limit (cost=0.00..359.08 rows=50 width=12) (actual > time=79088.099..79088.131 rows=50 loops=1) > -> WindowAgg (cost=0.00..438947.51 rows=61121 width=12) (actual > time=79088.095..79088.116 rows=50 loops=1) > -> Index Scan Backward using mytable_received_idx on mytable e > (cost=0.00..438183.50 rows=61121 width=12) (actual time=0.022..79060.177 > rows=55949 loops=1) > Filter: (datefiled > '2014-03-08 00:00:00'::timestamp > without > time zone) > Rows Removed by Filter: 9221863 > Total runtime: 79104.066 ms > Does the following query change your plan noticeably? explain analyze select xx, count(*) over () from ( select xx from mytable e where e.datefiled > current_day() - '30 days'::interval order by received desc limit 50) y;
On Apr 7, 2014, at 9:00 PM, bricklen <bricklen@gmail.com> wrote: >=20 > Does the following query change your plan noticeably? >=20 > explain analyze > select xx, count(*) over () > from ( > select xx > from mytable e > where > e.datefiled > current_day() - '30 days'::interval > order by received desc > limit 50) y;=20 >=20 yes, but it does not give the correct answer - since I want the total = results for paging purposes.=20 the current workaround (which is similar) I=92ve been toying with is: explain analyze select ss.*, count(*) over () from ( select iacc, received from mytable e where e.datefiled > current_day() - '30 = days'::interval offset 0 -- optimization barrier, ) ss order by received desc limit 50; which gives me the plan:=20 Limit (cost=3D59677.01..59677.13 rows=3D50 width=3D12) (actual = time=3D58.586..58.593 rows=3D50 loops=3D1) -> Sort (cost=3D59677.01..59824.72 rows=3D59084 width=3D12) (actual = time=3D58.585..58.587 rows=3D50 loops=3D1) Sort Key: e.received Sort Method: top-N heapsort Memory: 28kB -> WindowAgg (cost=3D689.51..57714.28 rows=3D59084 width=3D12) = (actual time=3D36.687..46.940 rows=3D55949 loops=3D1) -> Limit (cost=3D689.51..56384.89 rows=3D59084 = width=3D12) (actual time=3D6.571..26.143 rows=3D55949 loops=3D1) -> Bitmap Heap Scan on edgar e = (cost=3D689.51..56384.89 rows=3D59084 width=3D12) (actual = time=3D6.570..20.912 rows=3D55949 loops=3D1) Recheck Cond: (datefiled > '2014-03-09 = 00:00:00'::timestamp without time zone) -> Bitmap Index Scan on edgar_datefiled_idx = (cost=3D0.00..674.74 rows=3D59084 width=3D0) (actual time=3D6.217..6.217 = rows=3D55949 loops=3D1) Index Cond: (datefiled > '2014-03-09 = 00:00:00'::timestamp without time zone) Total runtime: 58.740 ms which is quite a bit better. trick now is plugging it into reality. -- Jeff Trout <jeff@jefftrout.com>