Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date
Msg-id CAApHDvogabtJzP8K6s75ruSXJAOfCvUJtUVWBS=2C-49kRqvCQ@mail.gmail.com
Whole thread Raw
In response to Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (Ronan Dunklau <ronan.dunklau@aiven.io>)
List pgsql-hackers
On Mon, 19 Jul 2021 at 18:32, Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> regression=# explain select sum(unique1 order by ten, two), sum(unique1 order
> by four), sum(unique1 order by two, four) from tenk2 group by ten;
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  GroupAggregate  (cost=1109.39..1234.49 rows=10 width=28)
>    Group Key: ten
>    ->  Sort  (cost=1109.39..1134.39 rows=10000 width=16)
>          Sort Key: ten, ten, two
>          ->  Seq Scan on tenk2  (cost=0.00..445.00 rows=10000 width=16)
>
>
> We would ideally like to sort on ten, two, four to satisfy the first and last
> aggref at once. Stripping the common prefix (ten) would eliminate this problem.

Thanks for finding this.  I've made a few changes to make this case
work as you describe. Please see attached v6 patches.

Because I had to add additional code to take the GROUP BY pathkeys
into account when choosing the best ORDER BY agg pathkeys, the
function that does that became a little bigger.  To try to reduce the
complexity of it, I got rid of all the processing from the initial
loop and instead it now uses the logic from the 2nd loop to find the
best pathkeys.  The reason I'd not done that in the first place was
because I'd thought I could get away without building an additional
Bitmapset for simple cases, but that's probably fairly cheap compared
to building Pathkeys.   With the additional complexity for the GROUP
BY pathkeys, the extra code seemed not worth it.

The 0001 patch is the ORDER BY aggregate code.  0002 is to fix up some
broken regression tests in postgres_fdw that 0001 caused.  It appears
that 0001 uncovered a bug in the postgres_fdw code.  I've reported
that in [1]. If that turns out to be a bug then it'll need to be fixed
before this can progress.

David

[1] https://www.postgresql.org/message-id/CAApHDvr4OeC2DBVY--zVP83-K=bYrTD7F8SZDhN4g+pj2f2S-A@mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Have I found an interval arithmetic bug?
Next
From: Michael Paquier
Date:
Subject: Re: Addition of authenticated ID to pg_stat_activity