Thread: Error when using array_agg with filter where clause in pg16 and pg17

Hello!

In postgresql 16 and 17 using array_agg with filter where gives an error, while in postgres 15 exact same query works.

This is minimal sample for reproducing:

create table test (id int, data jsonb);

insert into test (id, data) values
(1, '{"a": null}'),
(2, '{"a": "2"}'),
(3, '{"a": "2"}'),
(4, '{"a": ""}');

select array_agg(distinct (data->>'a')::int) filter (where data->>'a' is not null and data->>'a' != '')
from test;

Last query in pg16 or pg17 returns ERROR #22P02 invalid input syntax for type integer: ""
In pg15 it returns correct result {2}

wbr, Ferossa.
Kaimeh <kkaimeh@gmail.com> writes:
> In postgresql 16 and 17 using array_agg with filter where gives an error,
> while in postgres 15 exact same query works.

> This is minimal sample for reproducing:

> create table test (id int, data jsonb);
> insert into test (id, data) values
> (1, '{"a": null}'),
> (2, '{"a": "2"}'),
> (3, '{"a": "2"}'),
> (4, '{"a": ""}');
> select array_agg(distinct (data->>'a')::int) filter (where data->>'a' is
> not null and data->>'a' != '') from test;

Ugh.  EXPLAIN tells the tale:

 Aggregate  (cost=113.57..113.58 rows=1 width=32)
   Output: array_agg(DISTINCT (((data ->> 'a'::text))::integer)) FILTER (WHERE (((data ->> 'a'::text) IS NOT NULL) AND
((data->> 'a'::text) <> ''::text))) 
   ->  Sort  (cost=88.17..91.35 rows=1270 width=32)
         Output: data, (((data ->> 'a'::text))::integer)
         Sort Key: (((test.data ->> 'a'::text))::integer)
         ->  Seq Scan on public.test  (cost=0.00..22.70 rows=1270 width=32)
               Output: data, ((data ->> 'a'::text))::integer

We have pushed the array_agg argument down in order to sort by it,
neglecting the fact that there's a filter clause that should prevent
evaluation failures.

Bisecting fingers this commit:

1349d2790bf48a4de072931c722f39337e72055e is the first bad commit
commit 1349d2790bf48a4de072931c722f39337e72055e
Author: David Rowley <drowley@postgresql.org>
Date:   Tue Aug 2 23:11:45 2022 +1200

    Improve performance of ORDER BY / DISTINCT aggregates

Fortunately, that commit didn't actually rip out the old code path.
The simplest fix I can think of is to disable the presorted-agg
optimization if (1) there's a FILTER clause and (2) the proposed
sort key is anything more complex than a Var.  There might be
some wiggle room in (2) -- for instance, RelabelType(Var) should
be safe -- but we don't have a lot of intelligence about which
expression types are guaranteed error-free.

            regards, tom lane



On Wed, 9 Apr 2025 at 03:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bisecting fingers this commit:
>
> 1349d2790bf48a4de072931c722f39337e72055e is the first bad commit
> commit 1349d2790bf48a4de072931c722f39337e72055e
> Author: David Rowley <drowley@postgresql.org>
> Date:   Tue Aug 2 23:11:45 2022 +1200
>
>     Improve performance of ORDER BY / DISTINCT aggregates

:-(

> Fortunately, that commit didn't actually rip out the old code path.
> The simplest fix I can think of is to disable the presorted-agg
> optimization if (1) there's a FILTER clause and (2) the proposed
> sort key is anything more complex than a Var.  There might be
> some wiggle room in (2) -- for instance, RelabelType(Var) should
> be safe -- but we don't have a lot of intelligence about which
> expression types are guaranteed error-free.

Yeah, I can't think of any other fix.

Unfortunately, the situation is a little worse than what you
highlighted, as I think I didn't consider FILTER at all, and this
means I didn't consider the costing differences between filtering then
sorting vs sorting then filtering.  That commit does assume it's
always better to sort first when we can, which in many cases, that's
going to be true, e.g. when an index provides presorted input or when
the presorted sort order serves multiple aggregates. The non-presorted
path has to perform multiple identical sorts for the latter.

I've attached an experimental patch to fix the reported bug which
works by first checking for Vars and Consts, then falls back on
passing the Expr through strip_implicit_coercions() and trying again
to see if Vars and Consts were found. Simple tests show this seems to
work ok, but it does cause the expected results to change with the
sqljson tests.  The new results seem fine as the reason that query no
longer uses the presorted aggregate is because of the RowExpr in
"JSON_ARRAYAGG(foo ORDER BY bar) FILTER (WHERE bar > 2) as
row_filtered_agg", which causes no presortable Aggrefs to be found,
which results in the Aggref having to perform the sort.  That means
all the other non ORDER BY aggrefs give the results as per the order
of their inputs.

I wonder if we could work harder for RowExprs.
strip_implicit_coercions() seems to not want to handle them and even
mentions that fact in a comment. I don't have a grasp on why that is
or if it can be done by doing more work looking at each RowExpr.arg.

David

Attachment
David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 9 Apr 2025 at 03:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The simplest fix I can think of is to disable the presorted-agg
>> optimization if (1) there's a FILTER clause and (2) the proposed
>> sort key is anything more complex than a Var.

> Unfortunately, the situation is a little worse than what you
> highlighted, as I think I didn't consider FILTER at all, and this
> means I didn't consider the costing differences between filtering then
> sorting vs sorting then filtering.

Oooh.  If the FILTER clause is selective, that could easily mean that
the "optimization" loses big from having to sort many more tuples.
I wonder if we should just not apply it when there's a FILTER,
full stop.

            regards, tom lane



On Wed, 9 Apr 2025 at 12:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Unfortunately, the situation is a little worse than what you
> > highlighted, as I think I didn't consider FILTER at all, and this
> > means I didn't consider the costing differences between filtering then
> > sorting vs sorting then filtering.
>
> Oooh.  If the FILTER clause is selective, that could easily mean that
> the "optimization" loses big from having to sort many more tuples.
> I wonder if we should just not apply it when there's a FILTER,
> full stop.

Well, technically, that might have been a safer option if we'd caught
this before I committed that patch or before we released that version,
but it might only be a problem if we have to perform a Sort. If the
presorted-ness comes from an Index Scan, then we've not spent any
extra effort sorting tuples that'll be filtered. If we were to switch
the optimisation off for FILTER now, we could cause performance
regressions in the back branches for people who are getting benefits
from Index Scans with a FILTER clause.

The only way I can see to fix that properly is to cost it in during
aggregate planning. IIRC, there's no costing for the implicit sorts in
Aggref. We could add some of those and put a flag in AggPath which
gets propagated to Agg to specify if aggpresorted should be ignored or
not for the given Agg node. We'd have to add_path() for both versions
of the AggPath and let the cheapest Path win.

I suspect we should just leave this for v18 and maybe come back and
improve for v19. There is still SET enable_presorted_aggregate = 0; if
someone stumbles upon this.

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 9 Apr 2025 at 12:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Oooh.  If the FILTER clause is selective, that could easily mean that
>> the "optimization" loses big from having to sort many more tuples.
>> I wonder if we should just not apply it when there's a FILTER,
>> full stop.

> The only way I can see to fix that properly is to cost it in during
> aggregate planning. IIRC, there's no costing for the implicit sorts in
> Aggref. We could add some of those and put a flag in AggPath which
> gets propagated to Agg to specify if aggpresorted should be ignored or
> not for the given Agg node. We'd have to add_path() for both versions
> of the AggPath and let the cheapest Path win.

Yeah, AFAIR we never did any real costing of aggregate-internal
sorting.  However, adding that would pose the same risk you mentioned
that some queries might regress due to picking the worse plan.

> I suspect we should just leave this for v18 and maybe come back and
> improve for v19.

I think not doing anything is unacceptable: even though it took awhile
to notice, presorted_agg flat out breaks some queries that worked
before.  That trumps any worries about "maybe the plan will be worse",
and I don't even think it's a close decision.  So my inclination is
to do the simplest possible thing in v16-v18, and that seems to be
to disable presorted_agg if there's a FILTER.  Then we can look
into better ideas at leisure for v19.

            regards, tom lane



On Wed, 9 Apr 2025 at 12:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > I suspect we should just leave this for v18 and maybe come back and
> > improve for v19.
>
> I think not doing anything is unacceptable: even though it took awhile
> to notice, presorted_agg flat out breaks some queries that worked
> before.  That trumps any worries about "maybe the plan will be worse",
> and I don't even think it's a close decision.  So my inclination is
> to do the simplest possible thing in v16-v18, and that seems to be
> to disable presorted_agg if there's a FILTER.  Then we can look
> into better ideas at leisure for v19.

Misunderstanding. I meant do nothing about the costing issue. I still
think we should fix the bug, of course. The POC patch I posted is for
that part.

David



(I forgot to reply to this part)

On Wed, 9 Apr 2025 at 12:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > The only way I can see to fix that properly is to cost it in during
> > aggregate planning. IIRC, there's no costing for the implicit sorts in
> > Aggref. We could add some of those and put a flag in AggPath which
> > gets propagated to Agg to specify if aggpresorted should be ignored or
> > not for the given Agg node. We'd have to add_path() for both versions
> > of the AggPath and let the cheapest Path win.
>
> Yeah, AFAIR we never did any real costing of aggregate-internal
> sorting.  However, adding that would pose the same risk you mentioned
> that some queries might regress due to picking the worse plan.

The difference is that by the time we start generating AggPaths, we
have all the information we need to determine the selectivity of the
aggfilter and apply a sort cost to rows that survive that. So, with
the method I suggest, any poor plan choice is down to bad costing or
stats, whereas if we just disable the optimisation when the Aggref has
a FILTER, as you propose, then we'll always fallback on nodeAgg.c
doing the sorting, even for FILTERs that barely filter anything or
when there's a perfectly good index to give us presorted input.

Just to be clear, the idea I'm proposing for v19 is that we modify
cost_agg() adding a new bool parameter and have it add costs for the
implicit sorts for each Aggref that has an aggdistinct or aggorderby.
The bool parameter would control if aggpresorted Aggrefs were included
for those costs or ignored.  We'd then create two AggPaths, one which
would take advantage of presorting and uses a properly sorted input
path and another that ignores the aggpresorted flag and uses the
cheapest input path. add_path() then decides which of those is better.

I'm not following why my in method if the planner chooses a poor plan
is any different from the planner choosing a poor plan for anything
else because the stats or costs aren't a good reflection of reality.

David



David Rowley <dgrowleyml@gmail.com> writes:
> I'm not following why my in method if the planner chooses a poor plan
> is any different from the planner choosing a poor plan for anything
> else because the stats or costs aren't a good reflection of reality.

Well, we hope we'll usually pick the right thing.  But upthread
you were complaining that any change might be a regression, or
at least that's how I read it.

            regards, tom lane



On Wed, 9 Apr 2025 at 14:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > I'm not following why my in method if the planner chooses a poor plan
> > is any different from the planner choosing a poor plan for anything
> > else because the stats or costs aren't a good reflection of reality.
>
> Well, we hope we'll usually pick the right thing.  But upthread
> you were complaining that any change might be a regression, or
> at least that's how I read it.

I'm mostly concerned about just a blanket disabling of the presorted
optimisation when the Aggref has a FILTER. I think that's what you
proposed at one point. I'm more happy to go with your first proposal
to try and figure out if the args are safe to sort before filtering.

David