Hi Bertrand,
Thanks for your response. I am glad that you have found the proposal
to be generally useful.
On Thu, Aug 14, 2025 at 3:50 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > Using these trendlines, a user may decide to keep
> > logical_decoding_work_mem to minimum and thus let all the large
> > transactions spill to disk or be streamed.
>
> Right. Also, the issue with spill files is that when they are read back from disk
> then the reorder buffer can consume a lot of memory. That's why I think tracking
> the xid, sub xid and LSNs could be useful too. That could help diagnose why
> it was using that memory.
When the replication is stuck or is slow or WAL builds up a detailed
look at the reorder buffer's contents would be useful. It will need to
be done outside of pg_stat_replication or pg_stat_replication_slots. I
think it will be useful to provide a function which will trigger a
given WAL sender to dump the information about the contents of the
reorder buffer to the server error log or send them to the requesting
client.
>
> I think pg_stat_replication is a good place to record "real time" activities.
>
Ok. Thanks for confirming.
> Maybe we could log 2 metrics: the reorder buffer size before creating the spill
> files and the reorder buffer size while reading back the spill files? That would
> help make the distinction when logical_decoding_work_mem is involved (i.e before
> creating the spill files) and when it is not (reading them back).
>
> pg_stat_replication_slots could be used to add some counters too: like the
> number of times logical_decoding_work_mem has been reached while decoding from
> that slot.
I think pg_stat_replication_slots::spill_count and
pg_stat_replication_slots::stream_count give that value.
> Also maybe record the max value the reorder buffer reached and
> the associated xid, number of sub xids, and LSN? That could be useful to diagnose
> high memory consumption and/or OOM issues due to logical decoding.
A function which dumps the information about reorder buffers can be
used along with the trendline for this purpose. For example, if a user
notices WAL accumulation and an upward trend in the metric exposed by
my patch, they can trigger a dump of reorder buffer contents at that
time and diagnose the problem and even fix it proactively.
>
> Also, should we log some of information to the server log?
>
Information in server error logs is hard to access and analyse,
especially in cloud environment. Having a SQL construct is better.
> > But before I work on those, I would like to know others' opinions,
> > usefulness and acceptance.
>
> Thanks for working on it. I like the idea and also think that is an area where
> more details/metrics should be provided.
Thanks.
--
Best Wishes,
Ashutosh Bapat