Thread: BUG #9898: WindowAgg's causing horrific plans

BUG #9898: WindowAgg's causing horrific plans

From
jeff@jefftrout.com
Date:
The following bug has been logged on the website:

Bug reference:      9898
Logged by:          Jeff Trout
Email address:      jeff@jefftrout.com
PostgreSQL version: 9.3.4
Operating system:   OSX Mavericks
Description:

So I've encountered an issue present in PG 9.1, 9.2 and 9.3.

Before I get going, I'll note that in
http://www.postgresql.org/message-id/32583.1384469029@sss.pgh.pa.us
Tom saying there is no optimization around windowagg's, which is the heart
of
my issue. I have some hopes perhaps there's some low hanging fruit...

A query is involved with a WindowAgg can cause a
very bad plan to be used instead of an optimial one.

The original table I encountered this behavior on has about 9M rows.
There are indexes on both datefiled and recieved (timestamps). I tried an
index
on datefiled, recieved but it did not do anything.

First query:
explain analyze
select xx
from mytable e
where
         e.datefiled > current_day() - '30 days'::interval
        order by received desc
        limit 50;

I get a perfectly sensible plan:

 Limit  (cost=0.00..358.46 rows=50 width=12) (actual time=1.461..1.582
rows=50 loops=1)
   ->  Index Scan Backward using mytable_received_idx on mytable e
(cost=0.00..438183.50 rows=61121 width=12) (actual time=1.459..1.572 rows=50
loops=1)
         Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without time
zone)
         Rows Removed by Filter: 27
 Total runtime: 3.818 ms
(5 rows)

Now, lets throw a window agg in there -
explain analyze
select xx, count(*) over ()
from mytable e
where
         e.datefiled > current_day() - '30 days'::interval
        order by received desc
        limit 50;

(the count is to return total results, to display paging info)

and we get this plan:

 Limit  (cost=0.00..359.08 rows=50 width=12) (actual
time=79088.099..79088.131 rows=50 loops=1)
   ->  WindowAgg  (cost=0.00..438947.51 rows=61121 width=12) (actual
time=79088.095..79088.116 rows=50 loops=1)
         ->  Index Scan Backward using mytable_received_idx on mytable e
(cost=0.00..438183.50 rows=61121 width=12) (actual time=0.022..79060.177
rows=55949 loops=1)
               Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without
time zone)
               Rows Removed by Filter: 9221863
 Total runtime: 79104.066 ms

which is pretty insane, would have made more sense to do an index scan on
datefiled.
Here's where it gets a touch more interesting, lets get rid of that order by
received
in the query.  We now get:

 Limit  (cost=0.00..49.25 rows=50 width=4) (actual time=93.780..93.802
rows=50 loops=1)
   ->  WindowAgg  (cost=0.00..60209.86 rows=61121 width=4) (actual
time=93.778..93.792 rows=50 loops=1)
         ->  Index Scan using mytable_datefiled_idx on mytable e
(cost=0.00..59445.85 rows=61121 width=4) (actual time=1.283..77.869
rows=55949 loops=1)
               Index Cond: (datefiled > '2014-03-08 00:00:00'::timestamp
without time zone)
 Total runtime: 95.090 ms
(5 rows)

so in context of the orig query, that backwards scan on received made sense,
but
not in the window agg version.

one last thing, lets drop that index on recieved (this isn't really doable
in
production as many other things query against that field):

 Limit  (cost=60858.62..60858.74 rows=50 width=12) (actual
time=75.584..75.594 rows=50 loops=1)
   ->  Sort  (cost=60858.62..61011.42 rows=61121 width=12) (actual
time=75.581..75.585 rows=50 loops=1)
         Sort Key: received
         Sort Method: top-N heapsort  Memory: 28kB
         ->  WindowAgg  (cost=714.10..58828.22 rows=61121 width=12) (actual
time=46.410..60.409 rows=55949 loops=1)
               ->  Bitmap Heap Scan on mytable e  (cost=714.10..58064.21
rows=61121 width=12) (actual time=6.460..29.735 rows=55949 loops=1)
                     Recheck Cond: (datefiled > '2014-03-08
00:00:00'::timestamp without time zone)
                     ->  Bitmap Index Scan on mytable_datefiled_idx
(cost=0.00..698.82 rows=61121 width=0) (actual time=6.145..6.145 rows=55949
loops=1)
                           Index Cond: (datefiled > '2014-03-08
00:00:00'::timestamp without time zone)
 Total runtime: 75.674 ms
(10 rows)

which is perfectly sane.

Not sure if there is anything the planner can do (I'm now going to go play
the lets trick the planner with subqueries game)

Re: BUG #9898: WindowAgg's causing horrific plans

From
bricklen
Date:
On Mon, Apr 7, 2014 at 11:33 AM, <jeff@jefftrout.com> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      9898
> Logged by:          Jeff Trout
> Email address:      jeff@jefftrout.com
> PostgreSQL version: 9.3.4
> Operating system:   OSX Mavericks
> Description:
>
> So I've encountered an issue present in PG 9.1, 9.2 and 9.3.
>
> Before I get going, I'll note that in
> http://www.postgresql.org/message-id/32583.1384469029@sss.pgh.pa.us
> Tom saying there is no optimization around windowagg's, which is the heart
> of
> my issue. I have some hopes perhaps there's some low hanging fruit...
>
> A query is involved with a WindowAgg can cause a
> very bad plan to be used instead of an optimial one.
>
> The original table I encountered this behavior on has about 9M rows.
> There are indexes on both datefiled and recieved (timestamps). I tried an
> index
> on datefiled, recieved but it did not do anything.
>
> First query:
> explain analyze
> select xx
> from mytable e
> where
>                  e.datefiled > current_day() - '30 days'::interval
>                 order by received desc
>                 limit 50;
>
> I get a perfectly sensible plan:
>
>  Limit  (cost=0.00..358.46 rows=50 width=12) (actual time=1.461..1.582
> rows=50 loops=1)
>    ->  Index Scan Backward using mytable_received_idx on mytable e
> (cost=0.00..438183.50 rows=61121 width=12) (actual time=1.459..1.572
> rows=50
> loops=1)
>          Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without time
> zone)
>          Rows Removed by Filter: 27
>  Total runtime: 3.818 ms
> (5 rows)
>
> Now, lets throw a window agg in there -
> explain analyze
> select xx, count(*) over ()
> from mytable e
> where
>                  e.datefiled > current_day() - '30 days'::interval
>                 order by received desc
>                 limit 50;
>
> (the count is to return total results, to display paging info)
>
> and we get this plan:
>
>  Limit  (cost=0.00..359.08 rows=50 width=12) (actual
> time=79088.099..79088.131 rows=50 loops=1)
>    ->  WindowAgg  (cost=0.00..438947.51 rows=61121 width=12) (actual
> time=79088.095..79088.116 rows=50 loops=1)
>          ->  Index Scan Backward using mytable_received_idx on mytable e
> (cost=0.00..438183.50 rows=61121 width=12) (actual time=0.022..79060.177
> rows=55949 loops=1)
>                Filter: (datefiled > '2014-03-08 00:00:00'::timestamp
> without
> time zone)
>                Rows Removed by Filter: 9221863
>  Total runtime: 79104.066 ms
>

Does the following query change your plan noticeably?

explain analyze
select xx, count(*) over ()
from (
    select xx
    from mytable e
    where
    e.datefiled > current_day() - '30 days'::interval
    order by received desc
    limit 50) y;

Re: BUG #9898: WindowAgg's causing horrific plans

From
Jeff
Date:
On Apr 7, 2014, at 9:00 PM, bricklen <bricklen@gmail.com> wrote:

>=20
> Does the following query change your plan noticeably?
>=20
> explain analyze
> select xx, count(*) over ()
> from (
>     select xx
>     from mytable e
>     where
>     e.datefiled > current_day() - '30 days'::interval
>     order by received desc
>     limit 50) y;=20
>=20

yes, but it does not give the correct answer - since I want the total =
results for paging purposes.=20
the current workaround (which is similar) I=92ve been toying with is:

explain analyze
select ss.*, count(*) over ()
from (
        select iacc, received
        from mytable e
        where
             e.datefiled > current_day() - '30 =
days'::interval
             offset 0 -- optimization barrier,
) ss
order by received desc
limit 50;

which gives me the plan:=20
 Limit  (cost=3D59677.01..59677.13 rows=3D50 width=3D12) (actual =
time=3D58.586..58.593 rows=3D50 loops=3D1)
   ->  Sort  (cost=3D59677.01..59824.72 rows=3D59084 width=3D12) (actual =
time=3D58.585..58.587 rows=3D50 loops=3D1)
         Sort Key: e.received
         Sort Method: top-N heapsort  Memory: 28kB
         ->  WindowAgg  (cost=3D689.51..57714.28 rows=3D59084 width=3D12) =
(actual time=3D36.687..46.940 rows=3D55949 loops=3D1)
               ->  Limit  (cost=3D689.51..56384.89 rows=3D59084 =
width=3D12) (actual time=3D6.571..26.143 rows=3D55949 loops=3D1)
                     ->  Bitmap Heap Scan on edgar e  =
(cost=3D689.51..56384.89 rows=3D59084 width=3D12) (actual =
time=3D6.570..20.912 rows=3D55949 loops=3D1)
                           Recheck Cond: (datefiled > '2014-03-09 =
00:00:00'::timestamp without time zone)
                           ->  Bitmap Index Scan on edgar_datefiled_idx  =
(cost=3D0.00..674.74 rows=3D59084 width=3D0) (actual time=3D6.217..6.217 =
rows=3D55949 loops=3D1)
                                 Index Cond: (datefiled > '2014-03-09 =
00:00:00'::timestamp without time zone)
 Total runtime: 58.740 ms

which is quite a bit better.  trick now is plugging it into reality.

--
Jeff Trout <jeff@jefftrout.com>