On Tue, 19 Sept 2023 at 23:45, Richard Guo <guofenglinux@gmail.com> wrote:
> My first thought about the fix is that we artificially add resjunk
> target entries to parse->targetList for the ordered aggregates'
> arguments that are ORDER BY expressions, as attached. While this can
> fix the given query, it would cause Assert failure for the query in
> sql/triggers.sql.
> Any thoughts?
Unfortunately, we can't do that as it'll lead to target entries
existing in the GroupAggregate's target list that have been
aggregated.
postgres=# explain verbose SELECT a, COUNT(DISTINCT b) FROM rp GROUP BY a;
QUERY PLAN
--------------------------------------------------------------------------------------
Append (cost=88.17..201.39 rows=400 width=44)
-> GroupAggregate (cost=88.17..99.70 rows=200 width=44)
Output: rp.a, count(DISTINCT rp.b), rp.b
Your patch adds rp.b as an output column of the GroupAggregate.
Logically, that column cannot exist there as there is no correct
single value of rp.b after aggregation.
I think the fix needs to go into create_agg_path(). The problem is
that for AGG_SORTED we do:
if (aggstrategy == AGG_SORTED)
pathnode->path.pathkeys = subpath->pathkeys; /* preserves order */
which assumes that all of the columns before the aggregate will be
available after the aggregate. That likely used to work ok before
1349d2790 as the planner wouldn't have requested any Pathkeys for
columns that were not available below the Agg node.
We can no longer take the subpath pathkey's verbatim. We need to strip
off pathkeys for columns that are not in pathnode's targetlist.
I've attached a patch which adds a new function to pathkeys.c to strip
off any PathKeys in a list that don't have a corresponding item in the
given PathTarget and just return a prefix of the input pathkey list up
until the first expr that can't be found.
I'm concerned that this patch will be too much overhead when creating
paths when a PathKey's EquivalenceClass has a large number of members
from partitioned tables. I wondered if we should instead just check
if the subpath's pathkeys match root->group_pathkeys and if they do
set the AggPath's pathkeys to list_copy_head(subpath->pathkeys,
root->num_groupby_pathkeys), that'll be much cheaper, but it just
feels a bit too much like a special case.
David