Re: Add memory_limit_hits to pg_stat_replication_slots - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Add memory_limit_hits to pg_stat_replication_slots
Date
Msg-id aNUWz4wbJsgxkA8T@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Add memory_limit_hits to pg_stat_replication_slots  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Add memory_limit_hits to pg_stat_replication_slots
Re: Add memory_limit_hits to pg_stat_replication_slots
List pgsql-hackers
Hi,

On Wed, Sep 24, 2025 at 10:11:20AM -0700, Masahiko Sawada wrote:
> On Tue, Sep 23, 2025 at 11:31 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> 
> Thank you for updating the patch! Here are some comments:
> 
> ---
> +   bool        memory_limit_reached = (rb->size >=
> logical_decoding_work_mem * (Size) 1024);
> +
> +   if (memory_limit_reached)
> +       rb->memExceededCount += 1;
> 
> Do we want to use 'exceeded' for the variable too for better consistency?

I thought about it, but since we use ">=" I think that "reached" is more
accurate. So I went for "reached" for this one and "exceeded" for "user facing"
ones. That said I don't have a strong opinion about it, and I'd be ok to use
"exceeded" if you feel strong about it.

> ---
> One thing I want to clarify is that even if the memory usage exceeds
> the logical_decoding_work_mem it doesn't necessarily mean we serialize
> or stream transactions because of
> ReorderBufferCheckAndTruncateAbortedTXN().

Right.

> For example, in a situation
> where many large already-aborted transactions are truncated by
> transactionsReorderBufferCheckAndTruncateAbortedTXN(), users would see
> a high number in mem_exceeded_count column but it might not actually
> require any adjustment for logical_decoding_work_mem.

Yes, but in that case mem_exceeded_count would be high compared to spill_txns,
stream_txns, no?

> One idea is to
> increment that counter if exceeding memory usage is caused to
> serialize or stream any transactions. On the other hand, it might make
> sense and be straightforward too to show a pure statistic that the
> memory usage exceeded the logical_decoding_work_mem. What do you
> think?

The new counter, as it is proposed, helps to see if the workload hits the
logical_decoding_work_mem frequently or not. I think it's valuable information
to have on its own.

Now to check if logical_decoding_work_mem needs adjustment, one could compare
mem_exceeded_count with the existing spill_txns and stream_txns.

For example, If I abort 20 transactions that exceeded logical_decoding_work_mem
, I'd get:

postgres=# select spill_txns,stream_txns,mem_exceeded_count from pg_stat_replication_slots ;
 spill_txns | stream_txns | mem_exceeded_count
------------+-------------+--------------------
          0 |           0 |                 20
(1 row)

That way I could figure out that mem_exceeded_count has been reached for
aborted transactions.

OTOH, If one see spill_txns + stream_txns close to mem_exceeded_count, like:

postgres=# select spill_txns,stream_txns,mem_exceeded_count from pg_stat_replication_slots ;
 spill_txns | stream_txns | mem_exceeded_count
------------+-------------+--------------------
         38 |          20 |                 58
(1 row)

That probably means that mem_exceeded_count would need to be increased.

What do you think?

BTW, while doing some tests for the above examples, I noticed that the patch
was missing a check on memExceededCount in UpdateDecodingStats() (that produced
mem_exceeded_count being 0 for one of the new test in test_decoding): Fixed in
v5 attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Fix incorrect function comment of stringToNodeInternal
Next
From: Jim Jones
Date:
Subject: Re: Fix bug with accessing to temporary tables of other sessions