Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date
Msg-id CAApHDvrQf_YhWvMjij2q-PCjN0PMLudHmm1aObwd8BEjzwu3nA@mail.gmail.com
Whole thread Raw
In response to Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (Ronan Dunklau <ronan.dunklau@aiven.io>)
List pgsql-hackers
On Mon, 19 Jul 2021 at 18:32, Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> It means the logic for appending the order by pathkeys to the existing group
> by pathkeys would ideally need to remove the redundant group by keys from the
> order by keys, considering this example:
>
> regression=# explain select sum(unique1 order by ten, two), sum(unique1 order
> by four), sum(unique1 order by two, four) from tenk2 group by ten;
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  GroupAggregate  (cost=1109.39..1234.49 rows=10 width=28)
>    Group Key: ten
>    ->  Sort  (cost=1109.39..1134.39 rows=10000 width=16)
>          Sort Key: ten, ten, two
>          ->  Seq Scan on tenk2  (cost=0.00..445.00 rows=10000 width=16)
>
>
> We would ideally like to sort on ten, two, four to satisfy the first and last
> aggref at once. Stripping the common prefix (ten) would eliminate this problem.

hmm, yeah. That's not great.  This comes from the way I'm doing
list_concat on the pathkeys from the GROUP BY with the ones from the
ordered aggregates. If it were possible to use
make_pathkeys_for_sortclauses() to make these in one go, that would
fix the problem since pathkey_is_redundant() would skip the 2nd "ten".
Unfortunately, it's not possible to pass the combined list of
SortGroupClauses to make_pathkeys_for_sortclauses since they're not
from the same targetlist.  Aggrefs have their own targetlist and the
SortGroupClauses for the Aggref reference that tlist.

I think to do this we'd need something like pathkeys_append() in
pathkeys.c which had a loop and appended the pathkey only if
pathkey_is_redundant returns false.

> Also, existing regression tests cover the first problem (order by a grouping
> key) but I feel like they should be extended with a case similar as the above
> to check which pathkeys are used in the "multiple ordered aggregates + group
> by" cases.

It does seem like a bit of a weird case to go to a lot of effort to
make work, but it would be nice if it did work without having to
contort the code too much.

David



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: row filtering for logical replication
Next
From: Denis Hirn
Date:
Subject: Re: [PATCH] Allow multiple recursive self-references