Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date
Msg-id CAApHDvqCc=02UGp4oC5_y7qZ6uPZkMuhAhwfjNKD2hgnKRNUUA@mail.gmail.com
Whole thread Raw
In response to Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Fri, 16 Jul 2021 at 01:02, Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> The approach of building a pathkey for the first order by we find, then
> appending to it as needed seems sensible but I'm a bit worried about users
> starting to rely on this as an optimization. Even if we don't document it,
> people may start to change the order of their target lists to "force" a
> specific sort on the lower nodes. How confident are we that we won't change this
> or that we will be willing to break it ?

That's a good question.  I mainly did it that way because Windowing
functions work similarly based on the position of items in the
targetlist.  The situation there is slightly more complex as it
depends on the SortGroupClause->tleSortGroupRef.

> Generating all possible pathkeys and costing the resulting plans would be too
> expensive, but maybe a more "stable" (and limited) approach would be fine, like
> generating the pathkeys only if every ordered aggref shares the same prefix. I
> don't think there would be any ambiguity here.

I think that's a bad idea as it would leave a lot on the table. I
don't see any reason to make it that restrictive. Remember that before
this that every Aggref with a sort clause must perform their own sort.
So it's not like we'll ever increase the number of sorts here as a
result.

What we maybe could consider instead would be to pick the first Aggref
then look for the most sorted derivative of that then tally up the
number of Aggrefs that can be sorted using those pathkeys, then repeat
that process for the remaining Aggrefs that didn't have the same
prefix then use the pathkeys for the set with the most Aggrefs.  We
could still tiebreak on the targetlist position so at least it's not
random which ones we pick. Now that we have a list of Aggrefs that are
deduplicated in the planner thanks to 0a2bc5d61e it should be fairly
easy to do that.

David



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: row filtering for logical replication
Next
From: Amit Kapila
Date:
Subject: Re: Why ALTER SUBSCRIPTION ... SET (slot_name='none') requires subscription disabled?