Thread: Re: BUG #16925: ERROR: invalid DSA memory alloc request size 1073741824 CONTEXT: parallel worker
Re: BUG #16925: ERROR: invalid DSA memory alloc request size 1073741824 CONTEXT: parallel worker
From
Thomas Munro
Date:
On Thu, Mar 18, 2021 at 2:06 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Sat, Mar 13, 2021 at 11:59 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > 2021-03-12 19:45:37.352 CET [316243] ERROR: XX000: invalid DSA memory alloc > > > > request size 1073741824 > > After an off-list exchange with Karen and colleague who ran this with > the ERROR changed to a PANIC and examined the smoldering core, the > problem turns out to be a failure to keep the hashtable bucket array > <= MaxAllocSize in one code path. Although commit 86a2218e fixed > another version of that problem a while ago, it can still be > exceeded... by one byte... when we expand from one batch to many. > Will propose a fix. Here's a standalone reproducer with the right parameters to reach this error, and a simple fix. (Definitely room for more improvements in this area of code... but that'll have to be a project for later.) ===8<=== shared_buffers=2GB fsync=off max_wal_size=10GB min_dynamic_shared_memory=2GB ===8<=== create table bigger_than_it_looks as select generate_series(1, 256000000) as id; alter table bigger_than_it_looks set (autovacuum_enabled = 'false'); alter table bigger_than_it_looks set (parallel_workers = 1); analyze bigger_than_it_looks; update pg_class set reltuples = 5000000 where relname = 'bigger_than_it_looks'; ===8<=== postgres=# set work_mem = '4.5GB'; SET postgres=# explain analyze select count(*) from bigger_than_it_looks t1 join bigger_than_it_looks t2 using (id); ERROR: invalid DSA memory alloc request size 1073741824 CONTEXT: parallel worker ===8<===
Attachment
Re: BUG #16925: ERROR: invalid DSA memory alloc request size 1073741824 CONTEXT: parallel worker
From
Andrei Lepikhov
Date:
On 18/3/2021 16:21, Thomas Munro wrote: > ===8<=== > shared_buffers=2GB > fsync=off > max_wal_size=10GB > min_dynamic_shared_memory=2GB > ===8<=== > create table bigger_than_it_looks as > select generate_series(1, 256000000) as id; > alter table bigger_than_it_looks set (autovacuum_enabled = 'false'); > alter table bigger_than_it_looks set (parallel_workers = 1); > analyze bigger_than_it_looks; > update pg_class set reltuples = 5000000 where relname = 'bigger_than_it_looks'; > ===8<=== > postgres=# set work_mem = '4.5GB'; > SET > postgres=# explain analyze select count(*) from bigger_than_it_looks > t1 join bigger_than_it_looks t2 using (id); > ERROR: invalid DSA memory alloc request size 1073741824 > CONTEXT: parallel worker > ===8<=== This bug still annoyingly interrupts the queries of some clients. Maybe complete this work? It is stable and reproduces on all PG versions. The case: work_mem = '2GB' test table: ----------- CREATE TABLE bigger_than_it_looks AS SELECT generate_series(1, 512E6) AS id; ALTER TABLE bigger_than_it_looks SET (autovacuum_enabled = 'false'); ALTER TABLE bigger_than_it_looks SET (parallel_workers = 1); ANALYZE bigger_than_it_looks; UPDATE pg_class SET reltuples = 5000000 WHERE relname = 'bigger_than_it_looks'; The parallel workers number impacts size of the allowed memory under the hash table and in that sense correlates with the work_mem value, needed for the bug reproduction (keep in mind also that hash_mem_multiplier has been changed recently). Query: SELECT sum(a.id) FROM bigger_than_it_looks a JOIN bigger_than_it_looks b ON a.id =b.id LEFT JOIN bigger_than_it_looks c ON b.id = c.id; Any query that needs Parallel Hash Join can be found here. The case here is as follows. The first batch contains a lot of tuples (on increment, it has about 67mln tuples.). We calculate the number of buckets needed, approximately 134 mln (134217728). Remember, the size of dsa_pointer_atomic is 8 in my case, and it ends up with an overflow of the max number of DSA, which can be allocated (1073741823 bytes). See the new patch in the attachment. -- regards, Andrei Lepikhov Postgres Professional
Attachment
Re: BUG #16925: ERROR: invalid DSA memory alloc request size 1073741824 CONTEXT: parallel worker
From
Thomas Munro
Date:
On Wed, Dec 6, 2023 at 5:46 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 18/3/2021 16:21, Thomas Munro wrote: > > ===8<=== > > shared_buffers=2GB > > fsync=off > > max_wal_size=10GB > > min_dynamic_shared_memory=2GB > > ===8<=== > > create table bigger_than_it_looks as > > select generate_series(1, 256000000) as id; > > alter table bigger_than_it_looks set (autovacuum_enabled = 'false'); > > alter table bigger_than_it_looks set (parallel_workers = 1); > > analyze bigger_than_it_looks; > > update pg_class set reltuples = 5000000 where relname = 'bigger_than_it_looks'; > > ===8<=== > > postgres=# set work_mem = '4.5GB'; > > SET > > postgres=# explain analyze select count(*) from bigger_than_it_looks > > t1 join bigger_than_it_looks t2 using (id); > > ERROR: invalid DSA memory alloc request size 1073741824 > > CONTEXT: parallel worker > > ===8<=== > > This bug still annoyingly interrupts the queries of some clients. Maybe > complete this work? Ugh, sorry. We had a report, a repro and a candidate patch a couple of years ago, but I somehow completely forgot about it. I have now added a CF entry (#4689).
Re: BUG #16925: ERROR: invalid DSA memory alloc request size 1073741824 CONTEXT: parallel worker
From
Alena Rybakina
Date:
Hi! Thank you for your work on the subject. On 11.12.2023 02:30, Alena Rybakina wrote: > > On 06.12.2023 07:46, Andrei Lepikhov wrote: >> On 18/3/2021 16:21, Thomas Munro wrote: >>> ===8<=== >>> shared_buffers=2GB >>> fsync=off >>> max_wal_size=10GB >>> min_dynamic_shared_memory=2GB >>> ===8<=== >>> create table bigger_than_it_looks as >>> select generate_series(1, 256000000) as id; >>> alter table bigger_than_it_looks set (autovacuum_enabled = 'false'); >>> alter table bigger_than_it_looks set (parallel_workers = 1); >>> analyze bigger_than_it_looks; >>> update pg_class set reltuples = 5000000 where relname = >>> 'bigger_than_it_looks'; >>> ===8<=== >>> postgres=# set work_mem = '4.5GB'; >>> SET >>> postgres=# explain analyze select count(*) from bigger_than_it_looks >>> t1 join bigger_than_it_looks t2 using (id); >>> ERROR: invalid DSA memory alloc request size 1073741824 >>> CONTEXT: parallel worker >>> ===8<=== >> >> This bug still annoyingly interrupts the queries of some clients. >> Maybe complete this work? >> It is stable and reproduces on all PG versions. The case: >> work_mem = '2GB' >> >> test table: >> ----------- >> CREATE TABLE bigger_than_it_looks AS >> SELECT generate_series(1, 512E6) AS id; >> ALTER TABLE bigger_than_it_looks SET (autovacuum_enabled = 'false'); >> ALTER TABLE bigger_than_it_looks SET (parallel_workers = 1); >> ANALYZE bigger_than_it_looks; >> UPDATE pg_class SET reltuples = 5000000 >> WHERE relname = 'bigger_than_it_looks'; >> >> The parallel workers number impacts size of the allowed memory under >> the hash table and in that sense correlates with the work_mem value, >> needed for the bug reproduction (keep in mind also that >> hash_mem_multiplier has been changed recently). >> >> Query: >> SELECT sum(a.id) >> FROM bigger_than_it_looks a >> JOIN bigger_than_it_looks b ON a.id =b.id >> LEFT JOIN bigger_than_it_looks c ON b.id = c.id; >> >> Any query that needs Parallel Hash Join can be found here. The case >> here is as follows. >> The first batch contains a lot of tuples (on increment, it has about >> 67mln tuples.). We calculate the number of buckets needed, >> approximately 134 mln (134217728). Remember, the size of >> dsa_pointer_atomic is 8 in my case, and it ends up with an overflow >> of the max number of DSA, which can be allocated (1073741823 bytes). >> See the new patch in the attachment. I've looked through your code and haven't seen any errors yet, but I think we could rewrite these lines of code as follows: - dbuckets = ceil(dtuples / NTUP_PER_BUCKET); - dbuckets = Min(dbuckets, max_buckets); - new_nbuckets = (int) dbuckets; - new_nbuckets = Max(new_nbuckets, 1024); + dbuckets = Min(ceil(dtuples / NTUP_PER_BUCKET), max_buckets); + new_nbuckets = Max((int) dbuckets, 1024); I have attached a diff file with the proposed changes to this email. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Re: BUG #16925: ERROR: invalid DSA memory alloc request size 1073741824 CONTEXT: parallel worker
From
Andrei Lepikhov
Date:
On 10/12/2023 09:41, Thomas Munro wrote: > On Wed, Dec 6, 2023 at 5:46 PM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> This bug still annoyingly interrupts the queries of some clients. Maybe >> complete this work? > > Ugh, sorry. We had a report, a repro and a candidate patch a couple > of years ago, but I somehow completely forgot about it. I have now > added a CF entry (#4689). Thanks. I think the Parallel Hash Join code is worth discovering extreme cases anyway, but in that case, we have quite a clear bug and must fix it. -- regards, Andrei Lepikhov Postgres Professional
Re: BUG #16925: ERROR: invalid DSA memory alloc request size 1073741824 CONTEXT: parallel worker
From
Alexander Korotkov
Date:
On Mon, Dec 11, 2023 at 4:26 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 10/12/2023 09:41, Thomas Munro wrote: > > On Wed, Dec 6, 2023 at 5:46 PM Andrei Lepikhov > > <a.lepikhov@postgrespro.ru> wrote: > >> This bug still annoyingly interrupts the queries of some clients. Maybe > >> complete this work? > > > > Ugh, sorry. We had a report, a repro and a candidate patch a couple > > of years ago, but I somehow completely forgot about it. I have now > > added a CF entry (#4689). > > Thanks. I think the Parallel Hash Join code is worth discovering extreme > cases anyway, but in that case, we have quite a clear bug and must fix it. Thank you. I've pushed and backpatched this to PG 12 with some editing from me. ------ Regards, Alexander Korotkov