Home > mailing lists

DSA overflow in hash join - Mailing list pgsql-hackers

From	Konstantin Knizhnik
Subject	DSA overflow in hash join
Date	July 20 14:48:11
Msg-id	4abe0c31-097e-460e-a159-c8042b63160b@garret.ru Whole thread Raw
List	pgsql-hackers

Tree view

Hi hackers!

There is weird error rarely reproduced with sqlancer: `ERROR: invalid DSA memory alloc request size 1140850688`:

-------------------------------------------------------------------------------------------------------------------------

Finalize Aggregate (cost=114075075706156875776.00..114075075706156875776.00 rows=1 width=8)

-> Gather (cost=114075075706156875776.00..114075075706156875776.00 rows=4 width=8)

Workers Planned: 4

-> Partial Aggregate (cost=114075075706156875776.00..114075075706156875776.00 rows=1 width=8)

-> Parallel Hash Join (cost=6510542055.80..97778619693677723648.00 rows=6518582404991658491904 width=0)

Hash Cond: ((t2_1.c1 * t2_1.c1) = t4_1.c1)

-> Nested Loop (cost=0.00..57152791114.20 rows=2539498720000 width=32)

-> Parallel Seq Scan on t2 t2_1 (cost=0.00..12.20 rows=220 width=32)

-> Nested Loop (cost=0.00..144353654.10 rows=11543176000 width=0)

-> Nested Loop (cost=0.00..63915.85 rows=5107600 width=0)

-> Seq Scan on t5 (cost=0.00..32.60 rows=2260 width=0)

-> Materialize (cost=0.00..43.90 rows=2260 width=0)

-> Seq Scan on t0 (cost=0.00..32.60 rows=2260 width=0)

-> Materialize (cost=0.00..43.90 rows=2260 width=0)

-> Seq Scan on t0 t0_1 (cost=0.00..32.60 rows=2260 width=0)

-> Parallel Hash (cost=4028895759.80..4028895759.80 rows=128343727600 width=32)

-> Nested Loop (cost=0.00..4028895759.80 rows=128343727600 width=32)

-> Nested Loop (cost=0.00..3569757.80 rows=145845145 width=0)

-> Nested Loop Left Join (cost=0.00..7536.20 rows=64533 width=0)

Join Filter: (upper((t2.c1 + t2.c1)))::boolean

-> Parallel Seq Scan on t2 (cost=0.00..12.20 rows=220 width=32)

-> Seq Scan on t4 (cost=0.00..18.80 rows=880 width=0)

-> Seq Scan on t5 t5_1 (cost=0.00..32.60 rows=2260 width=0)

-> Seq Scan on t4 t4_1 (cost=0.00..18.80 rows=880 width=32)

The problem is reproduced only with max_parallel_workers_per_gather=4
The error is produced by `dsa_allocate*` most likely in nodeHash.c (fortunately there are not so much calls of dsa_allocate)
Certainly there are checks which should prevent allocation of more than MaxAllocSize:

max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinTuple));

But looks like not everywhere.
Particularly I suspect this places

nodeHash.c:1668:

/* Double the size of the bucket array. */
pstate->nbuckets *= 2;
size = pstate->nbuckets * sizeof(dsa_pointer_atomic);
hashtable->batches[0].shared->size += size / 2;
dsa_free(hashtable->area, hashtable->batches[0].shared->buckets);
hashtable->batches[0].shared->buckets =
dsa_allocate(hashtable->area, size);

nodeHash.c:3290:

batch->buckets =
dsa_allocate(hashtable->area, sizeof(dsa_pointer_atomic) * nbuckets);

In first case number of buckets is doubled, in the second case it may be doubled in ExecChooseHashTableSize:

/*
* It's better to use half the batches, so do that and adjust the
* nbucket in the opposite direction, and double the allowance.
*/
nbatch /= 2;
nbuckets *= 2;

May be I missing something, but do we have any protection here from overflow?

One more notice: as you can see estimation of number of rows in this case is 6518582404991658491904 (doesn't fit in 64 bits).
It is not difficult to create query with even larger estimation of number of tuples, for example:

create table t1(pk1 integer, sk1 integer);
create table t2(pk2 integer, sk2 integer);
create table t3(pk3 integer, sk3 integer);
create table t4(pk4 integer, sk4 integer);
create table t5(pk5 integer, sk5 integer);
create table t6(pk6 integer, sk6 integer);
insert into t1 values (generate_series(1,1000000), 0);
insert into t2 values (generate_series(1,1000000), 0);
insert into t3 values (generate_series(1,1000000), 0);
insert into t4 values (generate_series(1,1000000), 0);
insert into t5 values (generate_series(1,1000000), 0);
insert into t6 values (generate_series(1,1000000), 0);
analyze t1;
analyze t2;
analyze t3;
analyze t4;
analyze t5;
analyze t6;
explain select count(*) from t1 join t2 on sk1=sk2 join t3 on sk2=sk3 join t4 on sk3=sk4 join t5 on sk4=sk5 join t6 on sk5=sk6;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00 rows=1 width=8)
-> Gather (cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00 rows=2 width=8)
Workers Planned: 4
-> Partial Aggregate (cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00 rows=1 width=8)
-> Parallel Hash Join (cost=9195963541715053182976.00..1953125000012793188098301096886272.00 rows=416666666666666672044102856701116416 width=0)
Hash Cond: (t1.sk1 = t3.sk3)
-> Merge Join (cost=180939.82..6250190523.16 rows=416666666667 width=8)
Merge Cond: (t1.sk1 = t2.sk2)
-> Sort (cost=53182.48..54224.15 rows=416667 width=4)
Sort Key: t1.sk1
-> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667 width=4)
-> Materialize (cost=127757.34..132757.34 rows=1000000 width=4)
-> Sort (cost=127757.34..130257.34 rows=1000000 width=4)
Sort Key: t2.sk2
-> Seq Scan on t2 (cost=0.00..14425.00 rows=1000000 width=4)
-> Parallel Hash (cost=1953125000038385975296.00..1953125000038385975296.00 rows=416666666666666659676160 width=16)
-> Parallel Hash Join (cost=23086308963.32..1953125000038385975296.00 rows=416666666666666659676160 width=16)
Hash Cond: (t3.sk3 = t5.sk5)
-> Merge Join (cost=180939.82..6250190523.16 rows=416666666667 width=8)
Merge Cond: (t3.sk3 = t4.sk4)
-> Sort (cost=53182.48..54224.15 rows=416667 width=4)
Sort Key: t3.sk3
-> Parallel Seq Scan on t3 (cost=0.00..8591.67 rows=416667 width=4)
-> Materialize (cost=127757.34..132757.34 rows=1000000 width=4)
-> Sort (cost=127757.34..130257.34 rows=1000000 width=4)
Sort Key: t4.sk4
-> Seq Scan on t4 (cost=0.00..14425.00 rows=1000000 width=4)
-> Parallel Hash (cost=6250190523.16..6250190523.16 rows=416666666667 width=8)
-> Merge Join (cost=180939.82..6250190523.16 rows=416666666667 width=8)
Merge Cond: (t5.sk5 = t6.sk6)
-> Sort (cost=53182.48..54224.15 rows=416667 width=4)
Sort Key: t5.sk5
-> Parallel Seq Scan on t5 (cost=0.00..8591.67 rows=416667 width=4)
-> Materialize (cost=127757.34..132757.34 rows=1000000 width=4)
-> Sort (cost=127757.34..130257.34 rows=1000000 width=4)
Sort Key: t6.sk6
-> Seq Scan on t6 (cost=0.00..14425.00 rows=1000000 width=4)

Please notice that this query doesn't cause alloc error.
But I wonder if we should limit maxima; number of estimated rows, because it seems to be clear that none system can proceed 20^25 rows in some reasonable amount of time. So if we have such estimation that either there is some error in estimations and fortunately the query is finished much faster, either it is not finished at all. But from resource allocation point of view there is no sense in trying to allocate resources for 10^25 rows.

pgsql-hackers by date:

From: feichanghong
Date: 20 July, 11:49:16
Subject: Re: Even when the data is already ordered, MergeAppend still adds a Sort node

From: "Michael J. Baars"
Date: 20 July, 15:41:37
Subject: Re: Upgrade from Fedora 40 to Fedora 42, or from PostgreSQL 16.3 to PostgreSQL 16.9

DSA overflow in hash join - Mailing list pgsql-hackers

Previous

Next