Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker - Mailing list pgsql-bugs

From Alexey Ermakov
Subject Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Date
Msg-id 2ff6260b-8722-b301-cf7e-4e66a159778c@dataegret.com
Whole thread Raw
In response to Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
List pgsql-bugs
Hello! I reproduced generation of multiple temporary files and abnormal 
memory
usage (in 'top' output both backend and parallel worker used around 3-4GB of
RAM each, I often got OOM killer) but not memory allocation error so far.

On machine where I first got error I always got the same request allocation
size despite usage of different tables and conditions. Interesting that same
number was on another bug report.

Here is what I used:
PostgreSQL 16.2 (Ubuntu 16.2-1.pgdg20.04+1), virtual machine with 8GB of 
RAM,
6GB swap and around 15-20GB free space for temp files.

Almost all settings in postgresql.conf are default:
=========================================================================
shared_buffers = 128MB
effective_cache_size = 4GB
work_mem = 1MB
hash_mem_multiplier = 2
max_parallel_workers_per_gather = 1
jit = off
random_page_cost = 4
seq_page_cost = 1

test data:
=========================================================================
--test_1 table, 3.5M rows, 1..3M, n_distinct -0.8
create table test_1(a bigint);
insert into test_1 select id from generate_series(1,3000000) gs(id);
insert into test_1 select id from generate_series(1,500000) gs(id);
analyze test_1;

--test_2 table, 2 identical columns, 10M rows, 1..1M, n_distinct 933k
create table test_2(a bigint, b bigint);
insert into test_2 select gs.id, gs.id from generate_series(1,1000000) 
gs(id), generate_series(1,10) gs2(id);
create index on test_2(b);
analyze test_2;

--test_3 table, 12.58M rows, 1..500k, 80k rows with same value, 
n_distinct 490k
create table test_3(a bigint);
insert into test_3 select gs.id from generate_series(1,500000) gs(id), 
generate_series(1,25) gs2(id);
insert into test_3 select 999 from generate_series(1,80000);
create index on test_3(a);
analyze test_3;


=========================================================================
set work_mem = '2MB';
explain (analyze, buffers) select test_2.a
from test_2 left join test_3 on test_3.a = test_2.a and (test_3.a + 0) = 
test_3.a
left join test_1 on test_1.a = test_2.a
where test_2.b < 21500;

~224k temp files generated

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
  Gather  (cost=280156.31..385334.66 rows=308388 width=8) (actual 
time=16342.410..35577.099 rows=12349500 loops=1)
    Workers Planned: 1
    Workers Launched: 1
    Buffers: shared hit=1077677 read=14901, temp read=245247 written=912164
    ->  Parallel Hash Right Join  (cost=279156.31..353495.86 rows=181405 
width=8) (actual time=16340.950..29837.671 rows=6174750 loops=2)
          Hash Cond: (test_1.a = test_2.a)
          Buffers: shared hit=1077677 read=14901, temp read=245247 
written=912164
          ->  Parallel Seq Scan on test_1  (cost=0.00..36075.24 
rows=2058824 width=8) (actual time=0.142..169.721 rows=1750000 loops=2)
                Buffers: shared hit=586 read=14901
          ->  Parallel Hash  (cost=276962.04..276962.04 rows=133701 
width=8) (actual time=4391.620..4391.624 rows=3087375 loops=2)
                Buckets: 131072 (originally 131072)  Batches: 65536 
(originally 4)  Memory Usage: 32352kB
                Buffers: shared hit=1077043, temp read=72723 written=387876
                ->  Nested Loop Left Join (cost=2814.38..276962.04 
rows=133701 width=8) (actual time=2.884..567.014 rows=3087375 loops=2)
                      Buffers: shared hit=1077043
                      ->  Parallel Bitmap Heap Scan on test_2 
(cost=2813.95..108618.25 rows=133701 width=8) (actual time=2.865..11.149 
rows=107495 loops=2)
                            Recheck Cond: (b < 21500)
                            Heap Blocks: exact=652
                            Buffers: shared hit=1422
                            ->  Bitmap Index Scan on test_2_b_idx  
(cost=0.00..2757.12 rows=227292 width=0) (actual time=3.901..3.901 
rows=214990 loops=1)
                                  Index Cond: (b < 21500)
                                  Buffers: shared hit=250
                      ->  Index Only Scan using test_3_a_idx on test_3  
(cost=0.43..1.25 rows=1 width=8) (actual time=0.001..0.003 rows=29 
loops=214990)
                            Index Cond: (a = test_2.a)
                            Filter: ((a + 0) = a)
                            Heap Fetches: 0
                            Buffers: shared hit=1075621
  Planning:
    Buffers: shared hit=8
  Planning Time: 0.576 ms
  Execution Time: 35907.097 ms

Unfortunately that plan is not very stable with different parameters and on
different versions of postgresql (I wasn't able to get it on machine with
enough resources to test quickly). Sometimes planner wants to hash join with
test_3 table instead of test_1 or join with test_1 table but in inner part.
With other plans it won't reproduce.

with work_mem = 4MB we get 141k files, with work_mem = 1MB we get 66k files.
more files = more memory consumed = bigger execution time.
with more than 700-1000k files system became unresponsive.

in that test I've got:
 >Batches: 65536 (originally 4)

with prod data largest number of batches I've seen:
 >Buckets: 131072 (originally 131072)  Batches: 262144 (originally 16)  
Memory Usage: 4736kB

Perhaps that helps somehow.

 >So, can Alexey apply a trivial patch and check whether the bug has gone?
yes, I think it's possible to test with prod data but I don't fully 
understand
how would you like to change given line.

Thanks,
Alexey Ermakov


On 2024-02-29 09:36, Andrei Lepikhov wrote:
> I found one possible weak point in the code of PHJ:
> ExecParallelHashJoinSetUpBatches:
>
> pstate->batches = dsa_allocate0(hashtable->area,
>     EstimateParallelHashJoinBatch(hashtable) * nbatch);
>
> It could explain why we have such a huge memory allocation with a size 
> not bonded to a power of 2.
> Also, it makes more sense that we already have a second report [1] 
> with the same value of DSA allocation.
> So, can Alexey apply a trivial patch and check whether the bug has gone?
>
> [1] 
>
https://www.postgresql.org/message-id/flat/PAYP264MB3368AF4D63704E370B204C39FF582%40PAYP264MB3368.FRAP264.PROD.OUTLOOK.COM
>




pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: FSM Corruption (was: Could not read block at end of the relation)
Next
From: Andrei Lepikhov
Date:
Subject: Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker