Re: Add memory_limit_hits to pg_stat_replication_slots - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Add memory_limit_hits to pg_stat_replication_slots |
Date | |
Msg-id | aNUWz4wbJsgxkA8T@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Add memory_limit_hits to pg_stat_replication_slots (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Add memory_limit_hits to pg_stat_replication_slots
Re: Add memory_limit_hits to pg_stat_replication_slots |
List | pgsql-hackers |
Hi, On Wed, Sep 24, 2025 at 10:11:20AM -0700, Masahiko Sawada wrote: > On Tue, Sep 23, 2025 at 11:31 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > Thank you for updating the patch! Here are some comments: > > --- > + bool memory_limit_reached = (rb->size >= > logical_decoding_work_mem * (Size) 1024); > + > + if (memory_limit_reached) > + rb->memExceededCount += 1; > > Do we want to use 'exceeded' for the variable too for better consistency? I thought about it, but since we use ">=" I think that "reached" is more accurate. So I went for "reached" for this one and "exceeded" for "user facing" ones. That said I don't have a strong opinion about it, and I'd be ok to use "exceeded" if you feel strong about it. > --- > One thing I want to clarify is that even if the memory usage exceeds > the logical_decoding_work_mem it doesn't necessarily mean we serialize > or stream transactions because of > ReorderBufferCheckAndTruncateAbortedTXN(). Right. > For example, in a situation > where many large already-aborted transactions are truncated by > transactionsReorderBufferCheckAndTruncateAbortedTXN(), users would see > a high number in mem_exceeded_count column but it might not actually > require any adjustment for logical_decoding_work_mem. Yes, but in that case mem_exceeded_count would be high compared to spill_txns, stream_txns, no? > One idea is to > increment that counter if exceeding memory usage is caused to > serialize or stream any transactions. On the other hand, it might make > sense and be straightforward too to show a pure statistic that the > memory usage exceeded the logical_decoding_work_mem. What do you > think? The new counter, as it is proposed, helps to see if the workload hits the logical_decoding_work_mem frequently or not. I think it's valuable information to have on its own. Now to check if logical_decoding_work_mem needs adjustment, one could compare mem_exceeded_count with the existing spill_txns and stream_txns. For example, If I abort 20 transactions that exceeded logical_decoding_work_mem , I'd get: postgres=# select spill_txns,stream_txns,mem_exceeded_count from pg_stat_replication_slots ; spill_txns | stream_txns | mem_exceeded_count ------------+-------------+-------------------- 0 | 0 | 20 (1 row) That way I could figure out that mem_exceeded_count has been reached for aborted transactions. OTOH, If one see spill_txns + stream_txns close to mem_exceeded_count, like: postgres=# select spill_txns,stream_txns,mem_exceeded_count from pg_stat_replication_slots ; spill_txns | stream_txns | mem_exceeded_count ------------+-------------+-------------------- 38 | 20 | 58 (1 row) That probably means that mem_exceeded_count would need to be increased. What do you think? BTW, while doing some tests for the above examples, I noticed that the patch was missing a check on memExceededCount in UpdateDecodingStats() (that produced mem_exceeded_count being 0 for one of the new test in test_decoding): Fixed in v5 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: