Re: pg16: XX000: could not find pathkey item to sort - Mailing list pgsql-hackers

From David Rowley
Subject Re: pg16: XX000: could not find pathkey item to sort
Date
Msg-id CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=N-tB62jQ@mail.gmail.com
Whole thread Raw
In response to Re: pg16: XX000: could not find pathkey item to sort  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: pg16: XX000: could not find pathkey item to sort
List pgsql-hackers
On Tue, 19 Sept 2023 at 23:45, Richard Guo <guofenglinux@gmail.com> wrote:
> My first thought about the fix is that we artificially add resjunk
> target entries to parse->targetList for the ordered aggregates'
> arguments that are ORDER BY expressions, as attached.  While this can
> fix the given query, it would cause Assert failure for the query in
> sql/triggers.sql.

> Any thoughts?

Unfortunately, we can't do that as it'll lead to target entries
existing in the GroupAggregate's target list that have been
aggregated.

postgres=# explain verbose SELECT a, COUNT(DISTINCT b) FROM rp GROUP BY a;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Append  (cost=88.17..201.39 rows=400 width=44)
   ->  GroupAggregate  (cost=88.17..99.70 rows=200 width=44)
         Output: rp.a, count(DISTINCT rp.b), rp.b

Your patch adds rp.b as an output column of the GroupAggregate.
Logically, that column cannot exist there as there is no correct
single value of rp.b after aggregation.

I think the fix needs to go into create_agg_path().  The problem is
that for AGG_SORTED we do:

if (aggstrategy == AGG_SORTED)
    pathnode->path.pathkeys = subpath->pathkeys; /* preserves order */

which assumes that all of the columns before the aggregate will be
available after the aggregate.  That likely used to work ok before
1349d2790 as the planner wouldn't have requested any Pathkeys for
columns that were not available below the Agg node.

We can no longer take the subpath pathkey's verbatim. We need to strip
off pathkeys for columns that are not in pathnode's targetlist.

I've attached a patch which adds a new function to pathkeys.c to strip
off any PathKeys in a list that don't have a corresponding item in the
given PathTarget and just return a prefix of the input pathkey list up
until the first expr that can't be found.

I'm concerned that this patch will be too much overhead when creating
paths when a PathKey's EquivalenceClass has a large number of members
from partitioned tables.  I wondered if we should instead just check
if the subpath's pathkeys match root->group_pathkeys and if they do
set the AggPath's pathkeys to list_copy_head(subpath->pathkeys,
root->num_groupby_pathkeys),  that'll be much cheaper, but it just
feels a bit too much like a special case.

David

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Pre-proposal: unicode normalized text
Next
From: "David G. Johnston"
Date:
Subject: Re: CHECK Constraint Deferrable