BUG #9898: WindowAgg's causing horrific plans - Mailing list pgsql-bugs

From jeff@jefftrout.com
Subject BUG #9898: WindowAgg's causing horrific plans
Date
Msg-id 20140407183331.367.55401@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #9898: WindowAgg's causing horrific plans  (bricklen <bricklen@gmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      9898
Logged by:          Jeff Trout
Email address:      jeff@jefftrout.com
PostgreSQL version: 9.3.4
Operating system:   OSX Mavericks
Description:

So I've encountered an issue present in PG 9.1, 9.2 and 9.3.

Before I get going, I'll note that in
http://www.postgresql.org/message-id/32583.1384469029@sss.pgh.pa.us
Tom saying there is no optimization around windowagg's, which is the heart
of
my issue. I have some hopes perhaps there's some low hanging fruit...

A query is involved with a WindowAgg can cause a
very bad plan to be used instead of an optimial one.

The original table I encountered this behavior on has about 9M rows.
There are indexes on both datefiled and recieved (timestamps). I tried an
index
on datefiled, recieved but it did not do anything.

First query:
explain analyze
select xx
from mytable e
where
         e.datefiled > current_day() - '30 days'::interval
        order by received desc
        limit 50;

I get a perfectly sensible plan:

 Limit  (cost=0.00..358.46 rows=50 width=12) (actual time=1.461..1.582
rows=50 loops=1)
   ->  Index Scan Backward using mytable_received_idx on mytable e
(cost=0.00..438183.50 rows=61121 width=12) (actual time=1.459..1.572 rows=50
loops=1)
         Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without time
zone)
         Rows Removed by Filter: 27
 Total runtime: 3.818 ms
(5 rows)

Now, lets throw a window agg in there -
explain analyze
select xx, count(*) over ()
from mytable e
where
         e.datefiled > current_day() - '30 days'::interval
        order by received desc
        limit 50;

(the count is to return total results, to display paging info)

and we get this plan:

 Limit  (cost=0.00..359.08 rows=50 width=12) (actual
time=79088.099..79088.131 rows=50 loops=1)
   ->  WindowAgg  (cost=0.00..438947.51 rows=61121 width=12) (actual
time=79088.095..79088.116 rows=50 loops=1)
         ->  Index Scan Backward using mytable_received_idx on mytable e
(cost=0.00..438183.50 rows=61121 width=12) (actual time=0.022..79060.177
rows=55949 loops=1)
               Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without
time zone)
               Rows Removed by Filter: 9221863
 Total runtime: 79104.066 ms

which is pretty insane, would have made more sense to do an index scan on
datefiled.
Here's where it gets a touch more interesting, lets get rid of that order by
received
in the query.  We now get:

 Limit  (cost=0.00..49.25 rows=50 width=4) (actual time=93.780..93.802
rows=50 loops=1)
   ->  WindowAgg  (cost=0.00..60209.86 rows=61121 width=4) (actual
time=93.778..93.792 rows=50 loops=1)
         ->  Index Scan using mytable_datefiled_idx on mytable e
(cost=0.00..59445.85 rows=61121 width=4) (actual time=1.283..77.869
rows=55949 loops=1)
               Index Cond: (datefiled > '2014-03-08 00:00:00'::timestamp
without time zone)
 Total runtime: 95.090 ms
(5 rows)

so in context of the orig query, that backwards scan on received made sense,
but
not in the window agg version.

one last thing, lets drop that index on recieved (this isn't really doable
in
production as many other things query against that field):

 Limit  (cost=60858.62..60858.74 rows=50 width=12) (actual
time=75.584..75.594 rows=50 loops=1)
   ->  Sort  (cost=60858.62..61011.42 rows=61121 width=12) (actual
time=75.581..75.585 rows=50 loops=1)
         Sort Key: received
         Sort Method: top-N heapsort  Memory: 28kB
         ->  WindowAgg  (cost=714.10..58828.22 rows=61121 width=12) (actual
time=46.410..60.409 rows=55949 loops=1)
               ->  Bitmap Heap Scan on mytable e  (cost=714.10..58064.21
rows=61121 width=12) (actual time=6.460..29.735 rows=55949 loops=1)
                     Recheck Cond: (datefiled > '2014-03-08
00:00:00'::timestamp without time zone)
                     ->  Bitmap Index Scan on mytable_datefiled_idx
(cost=0.00..698.82 rows=61121 width=0) (actual time=6.145..6.145 rows=55949
loops=1)
                           Index Cond: (datefiled > '2014-03-08
00:00:00'::timestamp without time zone)
 Total runtime: 75.674 ms
(10 rows)

which is perfectly sane.

Not sure if there is anything the planner can do (I'm now going to go play
the lets trick the planner with subqueries game)

pgsql-bugs by date:

Previous
From: David Johnston
Date:
Subject: Re: BUG #9896: Bug in FULL OUTER JOIN
Next
From: klimm@lohn24.de
Date:
Subject: BUG #9897: Wrong Julian Date through ODBC