Re: Logical Replica ReorderBuffer Size Accounting Issues - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: Logical Replica ReorderBuffer Size Accounting Issues
Date
Msg-id CAA4eK1JjxNFGkDHLSSecWWD3nP+1KE4M=4G-AX2D2S+K_=m09w@mail.gmail.com
Whole thread Raw
In response to Logical Replica ReorderBuffer Size Accounting Issues  (Alex Richman <alexrichman@onesignal.com>)
Responses Re: Logical Replica ReorderBuffer Size Accounting Issues  (Alex Richman <alexrichman@onesignal.com>)
List pgsql-bugs
On Thu, Jan 5, 2023 at 5:27 PM Alex Richman <alexrichman@onesignal.com> wrote:
>
> We've noticed an odd memory issue with walsenders for logical replication slots - They experience large spikes in
memoryusage up to ~10x over the baseline from ~500MiB to ~5GiB, exceeding the configured logical_decoding_work_mem.
Sincewe have ~40 active subscriptions this produces a spike of ~200GiB on the sender, which is quite worrying. 
>
> The spikes in memory always slowly ramp up to ~5GB over ~10 minutes, then quickly drop back down to the ~500MB
baseline.
>
> logical_decoding_work_mem is configured to 256MB, and streaming is configured on the subscription side, so I would
expectthe slots to either stream to spill bytes to disk when they get to the 256MB limit, and not get close to 5GiB.
Howeverpg_stat_replication_slots shows 0 spilled or streamed bytes for any slots. 
>
>
> I used GDB to call MemoryContextStats on a walsender process with 5GB usage, which logged this large reorderbuffer
context:
> --- snip ---
>         ReorderBuffer: 65536 total in 4 blocks; 64624 free (169 chunks); 912 used
>           ReorderBufferByXid: 32768 total in 3 blocks; 12600 free (6 chunks); 20168 used
>           Tuples: 4311744512 total in 514 blocks (12858943 chunks); 6771224 free (12855411 chunks); 4304973288 used
>           TXN: 16944 total in 2 blocks; 13984 free (46 chunks); 2960 used
>           Change: 574944 total in 70 blocks; 214944 free (2239 chunks); 360000 used
> --- snip ---
>
>
> It's my understanding that the reorder buffer context is the thing that logical_decoding_work_mem specifically
constraints,so it's surprising to see that it's holding onto ~4GB of tuples instead of spooling them.  I found the code
forthat here:
https://github.com/postgres/postgres/blob/eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8/src/backend/replication/logical/reorderbuffer.c#L3557
whichsuggests it's checking rb->size against the configured work_mem. 
>
> I then used GDB to break into a high memory walsender and grab rb->size, which was only 73944.  So it looks like the
tuplememory isn't being properly accounted for in the total reorderbuffer size, so nothing is getting streamed/spooled? 
>

One possible reason for this difference is that the memory allocated
to decode the tuple from WAL in the function
ReorderBufferGetTupleBuf() is different from the actual memory
required/accounted for the tuple in the function
ReorderBufferChangeSize(). Do you have any sample data to confirm
this? If you can't share sample data, can you let us know the average
tuple size?

--
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #17737: An assert failed in execExprInterp.c
Next
From: Masahiko Sawada
Date:
Subject: Re: Segfault while creating logical replication slots on active DB 14.6-1 + 15.1-1