Re: Add memory_limit_hits to pg_stat_replication_slots - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add memory_limit_hits to pg_stat_replication_slots
Date
Msg-id CAD21AoBPKarxZNnSy884LzHtwO6QN0XUqRzSaM-sM79oPT=-UA@mail.gmail.com
Whole thread Raw
In response to Re: Add memory_limit_hits to pg_stat_replication_slots  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Tue, Sep 23, 2025 at 11:31 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Sep 23, 2025 at 11:39:22AM -0700, Masahiko Sawada wrote:
> > On Tue, Sep 23, 2025 at 1:52 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > However, technically speaking, "exceeded" is not the perfect wording since
> > > the code was doing (and is still doing something similar to):
> > >
> > >         if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED &&
> > > -               rb->size < logical_decoding_work_mem * (Size) 1024)
> > > +               !memory_limit_reached)
> > >                 return;
> > >
> > > as the comment describes correctly using "reached":
> > >
> > > /*
> > >  * Check whether the logical_decoding_work_mem limit was reached, and if yes
> > >  * pick the largest (sub)transaction at-a-time to evict and spill its changes to
> > >  * disk or send to the output plugin until we reach under the memory limit.
> > >
> > > So I think that "reached" or "hit" would be better wording. However, the
> > > documentation for spill_txns, stream_txns already use "exceeded" (and not "reached")
> > > so I went with "exceeded" for consistency. I think that's fine, if not we may want
> > > to use "reached" for those 3 stats descriptions.
> >
> > I find "exceeded" is fine as the documentation for logical decoding
> > also uses it[1]:
> >
> > Similar to spill-to-disk behavior, streaming is triggered when the
> > total amount of changes decoded from the WAL (for all in-progress
> > transactions) exceeds the limit defined by logical_decoding_work_mem
> > setting.
>
> Yes it also uses "exceeds" but I think it's not 100% accurate. It would be
> if, in ReorderBufferCheckMemoryLimit, we were using "<=" instead of "<" in:
>
>   if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED &&
>        rb->size < logical_decoding_work_mem * (Size) 1024)
>
> I think an accurate wording would be "reaches or exceeds" in all those places,
> but just using "exceeds" looks good enough.
>
> > One comment for the v3 patch:
> >
> > +       <para>
> > +        Number of times <literal>logical_decoding_work_mem</literal> has been
> > +        exceeded while decoding changes from WAL for this slot.
> > +       </para>
> >
> > How about rewording it to like:
> >
> > Number of times the memory used by logical decoding has exceeded
> > logical_decoding_work_mem.
>
> That sounds better, thanks! Used this wording in v4 attached (that's the only
> change as compared to v3).
>

Thank you for updating the patch! Here are some comments:

---
+   bool        memory_limit_reached = (rb->size >=
logical_decoding_work_mem * (Size) 1024);
+
+   if (memory_limit_reached)
+       rb->memExceededCount += 1;

Do we want to use 'exceeded' for the variable too for better consistency?

---
One thing I want to clarify is that even if the memory usage exceeds
the logical_decoding_work_mem it doesn't necessarily mean we serialize
or stream transactions because of
ReorderBufferCheckAndTruncateAbortedTXN(). For example, in a situation
where many large already-aborted transactions are truncated by
transactionsReorderBufferCheckAndTruncateAbortedTXN(), users would see
a high number in mem_exceeded_count column but it might not actually
require any adjustment for logical_decoding_work_mem. One idea is to
increment that counter if exceeding memory usage is caused to
serialize or stream any transactions. On the other hand, it might make
sense and be straightforward too to show a pure statistic that the
memory usage exceeded the logical_decoding_work_mem. What do you
think?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Next
From: Fujii Masao
Date:
Subject: Re: Suggestion to add --continue-client-on-abort option to pgbench