Re: BUG #9898: WindowAgg's causing horrific plans - Mailing list pgsql-bugs

From bricklen
Subject Re: BUG #9898: WindowAgg's causing horrific plans
Date
Msg-id CAGrpgQ8UXzZxxToEbSxk45WTHa6C5GLyXE5rywsGVBzrj-CX9w@mail.gmail.com
Whole thread Raw
In response to BUG #9898: WindowAgg's causing horrific plans  (jeff@jefftrout.com)
Responses Re: BUG #9898: WindowAgg's causing horrific plans  (Jeff <jeff@jefftrout.com>)
List pgsql-bugs
On Mon, Apr 7, 2014 at 11:33 AM, <jeff@jefftrout.com> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      9898
> Logged by:          Jeff Trout
> Email address:      jeff@jefftrout.com
> PostgreSQL version: 9.3.4
> Operating system:   OSX Mavericks
> Description:
>
> So I've encountered an issue present in PG 9.1, 9.2 and 9.3.
>
> Before I get going, I'll note that in
> http://www.postgresql.org/message-id/32583.1384469029@sss.pgh.pa.us
> Tom saying there is no optimization around windowagg's, which is the heart
> of
> my issue. I have some hopes perhaps there's some low hanging fruit...
>
> A query is involved with a WindowAgg can cause a
> very bad plan to be used instead of an optimial one.
>
> The original table I encountered this behavior on has about 9M rows.
> There are indexes on both datefiled and recieved (timestamps). I tried an
> index
> on datefiled, recieved but it did not do anything.
>
> First query:
> explain analyze
> select xx
> from mytable e
> where
>                  e.datefiled > current_day() - '30 days'::interval
>                 order by received desc
>                 limit 50;
>
> I get a perfectly sensible plan:
>
>  Limit  (cost=0.00..358.46 rows=50 width=12) (actual time=1.461..1.582
> rows=50 loops=1)
>    ->  Index Scan Backward using mytable_received_idx on mytable e
> (cost=0.00..438183.50 rows=61121 width=12) (actual time=1.459..1.572
> rows=50
> loops=1)
>          Filter: (datefiled > '2014-03-08 00:00:00'::timestamp without time
> zone)
>          Rows Removed by Filter: 27
>  Total runtime: 3.818 ms
> (5 rows)
>
> Now, lets throw a window agg in there -
> explain analyze
> select xx, count(*) over ()
> from mytable e
> where
>                  e.datefiled > current_day() - '30 days'::interval
>                 order by received desc
>                 limit 50;
>
> (the count is to return total results, to display paging info)
>
> and we get this plan:
>
>  Limit  (cost=0.00..359.08 rows=50 width=12) (actual
> time=79088.099..79088.131 rows=50 loops=1)
>    ->  WindowAgg  (cost=0.00..438947.51 rows=61121 width=12) (actual
> time=79088.095..79088.116 rows=50 loops=1)
>          ->  Index Scan Backward using mytable_received_idx on mytable e
> (cost=0.00..438183.50 rows=61121 width=12) (actual time=0.022..79060.177
> rows=55949 loops=1)
>                Filter: (datefiled > '2014-03-08 00:00:00'::timestamp
> without
> time zone)
>                Rows Removed by Filter: 9221863
>  Total runtime: 79104.066 ms
>

Does the following query change your plan noticeably?

explain analyze
select xx, count(*) over ()
from (
    select xx
    from mytable e
    where
    e.datefiled > current_day() - '30 days'::interval
    order by received desc
    limit 50) y;

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #9895: Duplicate pkey
Next
From: Michael Paquier
Date:
Subject: Re: BUG #9894: SQLError() is not returns SQL_NO_DATA_FOUND