Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date
Msg-id CAMbWs49xNn5uLmQGryTGF-LFPpFvouNSa1Y2BqgU_qq_Om=hSQ@mail.gmail.com
Whole thread Raw
In response to Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On Tue, Jan 17, 2023 at 11:39 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 17 Jan 2023 at 13:16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> I took a look at this, and I agree that the best solution is probably
> to have make_pathkeys_for_groupagg() ignore Aggrefs that contain
> volatile functions.

Thanks for giving that some additional thought.  I've just pushed a
fix which adjusts things that way.
 
This makes a lot of sense.  I agree that we shouldn't do pre-sorting for
volatile sort expressions, especially when there are multiple aggregates
with the same volatile sort expression.

Not related to this specific issue, but I find sorting by volatile
expression is confusing in different scenarios.  Consider the two
queries given by David

Query 1:
select string_agg(random()::text, ',' order by random()) from generate_series(1,3);

Query 2:
select random()::text from generate_series(1,3) order by random();

Considering the targetlist as Aggref->args or Query->targetList, in both
queries we would add an additional TargetEntry (as resjunk column) for
the ORDER BY item 'random()', because it's not present in the existing
targetlist.  Note that the existing TargetEntry for 'random()::text' is
a CoerceViaIO expression which is an explicit cast, so we cannot strip
it and match it to the ORDER BY item.  Thus we would have two random()
FuncExprs in the final targetlist, for both queries.

In query 1 we call random() twice for each tuple, one for the original
TargetEntry 'random()::text', and one for the TargetEntry of the ORDER
BY item 'random()', and do the sorting according to the second call
results.  Thus we would notice the final output is unsorted because it's
from the first random() call.

However, in query 2 we have the ORDER BY item 'random()' in the
scan/join node's targetlist.  And then for the two random() FuncExprs in
the final targetlist, set_plan_references would adjust both of them to
refer to the outputs of the scan/join node.  Thus random() is actually
called only once for each tuple and we would find the final output is
sorted.

It seems we fail to keep consistent about the behavior of sorting by
volatile expression in the two scenarios.

BTW, I wonder if we should have checked CoercionForm before
fix_upper_expr_mutator steps into CoerceViaIO->arg to adjust the expr
there.  It seems parser checks it and only strips implicit coercions
when matching TargetEntry expr to ORDER BY item.

Thanks
Richard

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: constify arguments of copy_file() and copydir()
Next
From: Masahiko Sawada
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply