Scott Carey <scott.carey@algonomy.com> writes:
> I have discovered the root cause.
> This database is old. It pre-dates Postgres 8.4 which introduced
> array_agg. Apparently, in some version prior to 8.4 array_agg was added
> as a user function, defined as below for bigint:
> create AGGREGATE array_agg(
> BASETYPE = bigint,
> SFUNC = array_append,
> STYPE = bigint[],
> INITCOND = '{}'
> );
> So if you create a test database and run the previous test, performance
> will be fine and the query will be fast. Then run:
> create AGGREGATE array_agg(BASETYPE = bigint, SFUNC = array_append,STYPE =
> bigint[], INITCOND = '{}');
> It will be slow and reproduce this behavior.
Thank you for running that to ground! I confirm your results that v13
and up are far slower for this example than v12 was.
> Why would this run so much more slowly after updating from postgres 12 to
> 17? It is a user defined aggregate, although maybe not as optimized as
> the intrinsic one it shouldn't behave this way.
I did some bisecting using the attached simplified test case, and found
that the query execution time jumps from circa 60ms to circa 7500ms here:
1f39bce021540fde00990af55b4432c55ef4b3c7 is the first bad commit
commit 1f39bce021540fde00990af55b4432c55ef4b3c7
Author: Jeff Davis <jdavis@postgresql.org>
Date: Wed Mar 18 15:42:02 2020 -0700
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
(Times quoted are on a Mac M4 Pro, but in assert-enabled builds so
maybe not directly comparable to production.)
I'm bemused as to why: the test case has work_mem set high enough that
we shouldn't be triggering spill mode, so why did this change affect
it at all?
regards, tom lane
CREATE AGGREGATE array_agg(
BASETYPE = bigint,
SFUNC = array_append,
STYPE = bigint[],
INITCOND = '{}'
);
drop table if exists array_agg_test;
create table array_agg_test(product_id bigint not null, region_id bigint
not null, available boolean not null);
insert into array_agg_test (product_id, region_id, available) SELECT
generate_series(1, 50000) as product_id,
(ARRAY[1,2,3,4])[floor(random()*4)+1] as region_id, true as available;
insert into array_agg_test (product_id, region_id, available) SELECT
generate_series(1, 50000) as product_id,
(ARRAY[11,12,13,14])[floor(random()*4)+1] as region_id, true as available;
insert into array_agg_test (product_id, region_id, available) SELECT
generate_series(1, 50000) as product_id,
(ARRAY[111,112,113,114])[floor(random()*4)+1] as region_id, true as
available;
insert into array_agg_test (product_id, region_id, available) SELECT
generate_series(1, 50000) as product_id,
(ARRAY[1111,1112,1113,1114])[floor(random()*4)+1] as region_id, true as
available;
vacuum analyze array_agg_test;
\set ECHO all
-- set hash_mem_multiplier = 2;
set work_mem = "200MB";
explain (analyze, buffers) select product_id, array_agg(region_id) from
array_agg_test group by product_id;