Thread: Error from array_agg when table has many rows
Hello, After upgrading from PostgreSQL 12.19 to PostgreSQL 16.3 the function "array_agg" gives me the following error under certainconditions. ERROR: input of anonymous composite types is not implemented I was also able reproduce the issue on PostgreSQL 17.4 via the latest currently available Docker image. Please see the following SQL script for the minimum reproduction. ``` drop table if exists my_table; create table my_table ( my_text text ); -- Insert a minimum of 200,000 rows. do $$ declare counter integer := 0; begin while counter < 200000 loop INSERT INTO my_table (my_text) VALUES ('A simple sentence.'); counter := counter + 1; end loop; end; $$; select array_agg(t) from (select my_text from my_table WHERE my_text != '') t; ``` On my machine the issue appears if 200,000 or more rows exist. I used EXPLAIN and noticed that the query plan is differentif this condition is met. This is the query plan if fewer than 200,000 rows exist. ``` Aggregate (cost=1935.40..1935.41 rows=1 width=32) -> Seq Scan on my_table (cost=0.00..1719.90 rows=86199 width=32) Filter: (my_text <> ''::text) ``` This is the query plan if more than 200,000 rows exist. ``` Finalize Aggregate (cost=3801.65..3801.66 rows=1 width=32) -> Gather (cost=3801.53..3801.64 rows=1 width=32) Workers Planned: 1 -> Partial Aggregate (cost=2801.53..2801.54 rows=1 width=32) -> Parallel Seq Scan on my_table (cost=0.00..2548.00 rows=101411 width=32) Filter: (my_text <> ''::text) ``` I am also able to reproduce the issue on AWS RDS PostgreSQL 16.3. Thank you!
On Sat, Mar 8, 2025 at 8:10 AM Kirill Zdornyy <kirill@dineserve.com> wrote: > After upgrading from PostgreSQL 12.19 to PostgreSQL 16.3 the function "array_agg" gives me the following error under certainconditions. > > ERROR: input of anonymous composite types is not implemented > > I was also able reproduce the issue on PostgreSQL 17.4 via the latest currently available Docker image. Thanks for the report! I can reproduce it on master with the query below. create table t (a int); insert into t values (1); set parallel_setup_cost=0; set parallel_tuple_cost=0; set min_parallel_table_scan_size=0; # select array_agg(s) from (select * from t) s; ERROR: input of anonymous composite types is not implemented And the plan for this query is: explain (verbose, costs off) select array_agg(s) from (select * from t) s; QUERY PLAN --------------------------------------------------- Finalize Aggregate Output: array_agg(ROW(t.a)) -> Gather Output: (PARTIAL array_agg(ROW(t.a))) Workers Planned: 2 -> Partial Aggregate Output: PARTIAL array_agg(ROW(t.a)) -> Parallel Seq Scan on public.t Output: t.a (9 rows) We are performing deserialization during the final phase of the aggregation on data of type RECORD but we fail to provide a valid typmod (array_agg_deserialize() uses -1 as the typmod when calling the receiveproc). I haven't verified it, but I suspect it's related to 16fd03e95. Thanks Richard
Richard Guo <guofenglinux@gmail.com> writes: > We are performing deserialization during the final phase of the > aggregation on data of type RECORD but we fail to provide a valid > typmod (array_agg_deserialize() uses -1 as the typmod when calling the > receiveproc). > I haven't verified it, but I suspect it's related to 16fd03e95. Yeah. I don't think there is any way for array_agg_deserialize to know the correct typmod, so what we have to do is disable using partial aggregation in this case. Fortunately there's a policy-setting function that can be taught that, as attached. regards, tom lane diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c index 9e567f3cc45..0ac8966e30f 100644 --- a/src/backend/parser/parse_agg.c +++ b/src/backend/parser/parse_agg.c @@ -2052,7 +2052,7 @@ resolve_aggregate_transtype(Oid aggfuncid, /* * agg_args_support_sendreceive - * Returns true if all non-byval of aggref's arg types have send and + * Returns true if all non-byval types of aggref's args have send and * receive functions. */ bool @@ -2067,6 +2067,15 @@ agg_args_support_sendreceive(Aggref *aggref) TargetEntry *tle = (TargetEntry *) lfirst(lc); Oid type = exprType((Node *) tle->expr); + /* + * RECORD is a special case: it has typsend/typreceive functions, but + * record_recv only works if passed the correct typmod to identify the + * specific anonymous record type. array_agg_deserialize cannot do + * that, so we have to disclaim support for the case. + */ + if (type == RECORDOID) + return false; + typeTuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(type)); if (!HeapTupleIsValid(typeTuple)) elog(ERROR, "cache lookup failed for type %u", type);
On Sun, 9 Mar 2025 at 04:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Richard Guo <guofenglinux@gmail.com> writes: > > We are performing deserialization during the final phase of the > > aggregation on data of type RECORD but we fail to provide a valid > > typmod (array_agg_deserialize() uses -1 as the typmod when calling the > > receiveproc). > > > I haven't verified it, but I suspect it's related to 16fd03e95. > > Yeah. I don't think there is any way for array_agg_deserialize to > know the correct typmod, so what we have to do is disable using > partial aggregation in this case. Fortunately there's a > policy-setting function that can be taught that, as attached. The only way I can think of to get that would be to special-case array_agg_serialize() to have it serialize the typmod when the send function is record_send(), then add a similar special-case to array_agg_deserialize() to check for a record_recv() and deserialize the typmod there. That doesn't seem very pretty, so I'm happy to go with your fix to disable parallel aggregates for this case. David
David Rowley <dgrowleyml@gmail.com> writes: > On Sun, 9 Mar 2025 at 04:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah. I don't think there is any way for array_agg_deserialize to >> know the correct typmod, so what we have to do is disable using >> partial aggregation in this case. Fortunately there's a >> policy-setting function that can be taught that, as attached. > The only way I can think of to get that would be to special-case > array_agg_serialize() to have it serialize the typmod when the send > function is record_send(), then add a similar special-case to > array_agg_deserialize() to check for a record_recv() and deserialize > the typmod there. That doesn't seem very pretty, so I'm happy to go > with your fix to disable parallel aggregates for this case. Yeah ... we could probably make that work if we had to, but the ugliness would likely metastasize outside array_agg_[de]serialize. Since it took more than a year to get a field report, I'm content to just disable the optimization instead. Pushed that way. regards, tom lane