Thread: Error from array_agg when table has many rows

Error from array_agg when table has many rows

From
Kirill Zdornyy
Date:
Hello,

After upgrading from PostgreSQL 12.19 to PostgreSQL 16.3 the function "array_agg" gives me the following error under
certainconditions. 

ERROR: input of anonymous composite types is not implemented

I was also able reproduce the issue on PostgreSQL 17.4 via the latest currently available Docker image.

Please see the following SQL script for the minimum reproduction.

```
drop table if exists my_table;

create table my_table
(
    my_text text
);

-- Insert a minimum of 200,000 rows.
do
$$
    declare
        counter integer := 0;
    begin
        while counter < 200000
            loop
                INSERT INTO my_table (my_text) VALUES ('A simple sentence.');
                counter := counter + 1;
            end loop;
    end;
$$;

select array_agg(t)
from (select my_text from my_table WHERE my_text != '') t;
```

On my machine the issue appears if 200,000 or more rows exist. I used EXPLAIN and noticed that the query plan is
differentif this condition is met. 

This is the query plan if fewer than 200,000 rows exist.

```
Aggregate  (cost=1935.40..1935.41 rows=1 width=32)
->  Seq Scan on my_table  (cost=0.00..1719.90 rows=86199 width=32)
Filter: (my_text <> ''::text)
```

This is the query plan if more than 200,000 rows exist.

```
Finalize Aggregate  (cost=3801.65..3801.66 rows=1 width=32)
->  Gather  (cost=3801.53..3801.64 rows=1 width=32)
Workers Planned: 1
->  Partial Aggregate  (cost=2801.53..2801.54 rows=1 width=32)
->  Parallel Seq Scan on my_table  (cost=0.00..2548.00 rows=101411 width=32)
Filter: (my_text <> ''::text)
```

I am also able to reproduce the issue on AWS RDS PostgreSQL 16.3.

Thank you!



Re: Error from array_agg when table has many rows

From
Richard Guo
Date:
On Sat, Mar 8, 2025 at 8:10 AM Kirill Zdornyy <kirill@dineserve.com> wrote:
> After upgrading from PostgreSQL 12.19 to PostgreSQL 16.3 the function "array_agg" gives me the following error under
certainconditions. 
>
> ERROR: input of anonymous composite types is not implemented
>
> I was also able reproduce the issue on PostgreSQL 17.4 via the latest currently available Docker image.

Thanks for the report!  I can reproduce it on master with the query
below.

create table t (a int);
insert into t values (1);

set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_table_scan_size=0;

# select array_agg(s) from (select * from t) s;
ERROR:  input of anonymous composite types is not implemented

And the plan for this query is:

explain (verbose, costs off)
select array_agg(s) from (select * from t) s;
                    QUERY PLAN
---------------------------------------------------
 Finalize Aggregate
   Output: array_agg(ROW(t.a))
   ->  Gather
         Output: (PARTIAL array_agg(ROW(t.a)))
         Workers Planned: 2
         ->  Partial Aggregate
               Output: PARTIAL array_agg(ROW(t.a))
               ->  Parallel Seq Scan on public.t
                     Output: t.a
(9 rows)

We are performing deserialization during the final phase of the
aggregation on data of type RECORD but we fail to provide a valid
typmod (array_agg_deserialize() uses -1 as the typmod when calling the
receiveproc).

I haven't verified it, but I suspect it's related to 16fd03e95.

Thanks
Richard



Re: Error from array_agg when table has many rows

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> We are performing deserialization during the final phase of the
> aggregation on data of type RECORD but we fail to provide a valid
> typmod (array_agg_deserialize() uses -1 as the typmod when calling the
> receiveproc).

> I haven't verified it, but I suspect it's related to 16fd03e95.

Yeah.  I don't think there is any way for array_agg_deserialize to
know the correct typmod, so what we have to do is disable using
partial aggregation in this case.  Fortunately there's a
policy-setting function that can be taught that, as attached.

            regards, tom lane

diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 9e567f3cc45..0ac8966e30f 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -2052,7 +2052,7 @@ resolve_aggregate_transtype(Oid aggfuncid,
 
 /*
  * agg_args_support_sendreceive
- *        Returns true if all non-byval of aggref's arg types have send and
+ *        Returns true if all non-byval types of aggref's args have send and
  *        receive functions.
  */
 bool
@@ -2067,6 +2067,15 @@ agg_args_support_sendreceive(Aggref *aggref)
         TargetEntry *tle = (TargetEntry *) lfirst(lc);
         Oid            type = exprType((Node *) tle->expr);
 
+        /*
+         * RECORD is a special case: it has typsend/typreceive functions, but
+         * record_recv only works if passed the correct typmod to identify the
+         * specific anonymous record type.  array_agg_deserialize cannot do
+         * that, so we have to disclaim support for the case.
+         */
+        if (type == RECORDOID)
+            return false;
+
         typeTuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(type));
         if (!HeapTupleIsValid(typeTuple))
             elog(ERROR, "cache lookup failed for type %u", type);

Re: Error from array_agg when table has many rows

From
David Rowley
Date:
On Sun, 9 Mar 2025 at 04:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Richard Guo <guofenglinux@gmail.com> writes:
> > We are performing deserialization during the final phase of the
> > aggregation on data of type RECORD but we fail to provide a valid
> > typmod (array_agg_deserialize() uses -1 as the typmod when calling the
> > receiveproc).
>
> > I haven't verified it, but I suspect it's related to 16fd03e95.
>
> Yeah.  I don't think there is any way for array_agg_deserialize to
> know the correct typmod, so what we have to do is disable using
> partial aggregation in this case.  Fortunately there's a
> policy-setting function that can be taught that, as attached.

The only way I can think of to get that would be to special-case
array_agg_serialize() to have it serialize the typmod when the send
function is record_send(), then add a similar special-case to
array_agg_deserialize() to check for a record_recv() and deserialize
the typmod there. That doesn't seem very pretty, so I'm happy to go
with your fix to disable parallel aggregates for this case.

David



Re: Error from array_agg when table has many rows

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Sun, 9 Mar 2025 at 04:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  I don't think there is any way for array_agg_deserialize to
>> know the correct typmod, so what we have to do is disable using
>> partial aggregation in this case.  Fortunately there's a
>> policy-setting function that can be taught that, as attached.

> The only way I can think of to get that would be to special-case
> array_agg_serialize() to have it serialize the typmod when the send
> function is record_send(), then add a similar special-case to
> array_agg_deserialize() to check for a record_recv() and deserialize
> the typmod there. That doesn't seem very pretty, so I'm happy to go
> with your fix to disable parallel aggregates for this case.

Yeah ... we could probably make that work if we had to, but the
ugliness would likely metastasize outside array_agg_[de]serialize.
Since it took more than a year to get a field report, I'm content
to just disable the optimization instead.  Pushed that way.

            regards, tom lane