Home > mailing lists

Re: Report reorder buffer size - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Report reorder buffer size
Date	September 23 08:59:44
Msg-id	CAD21AoCo_0GRPPGxvd_jJ+uutnc-oZjEMVA987GxFcoRqc6xsg@mail.gmail.com Whole thread Raw
In response to	Re: Report reorder buffer size (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses	Re: Report reorder buffer size
List	pgsql-hackers

Tree view

On Wed, Aug 27, 2025 at 9:36 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Aug 28, 2025 at 5:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Aug 26, 2025 at 2:45 AM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > Hi Masahiko,
> > > Thanks for your inputs.
> > >
> > > On Tue, Aug 26, 2025 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > On Wed, Aug 13, 2025 at 5:40 AM Ashutosh Bapat
> > > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > Thank you for starting the thread about this topic. This is one of the
> > > > topics that I've been eager to address. Here are some random comments:
> > > >
> > >
> > > Glad that more people think there is a need for improving the reorder
> > > buffer statistics.
> > >
> > > > >
> > > > > With the attached WIP patch, we can report, at a given point in time,
> > > > > total size of reorder buffer i.e. sum of the size of all the changes
> > > > > belonging to the transactions active in the reorder buffer whether
> > > > > those changes are in memory, on disk or sent downstream as part of an
> > > > > unfinished streamed transaction. By sampling this value at regular
> > > > > intervals, one can observe the reorder buffer trendline as shown in
> > > > > the plots attached.
> > > >
> > > > Alternative idea (or an additional metric) would be the high-watermark
> > > > of memory usage. That way, users won't miss memory usage spikes that
> > > > might be missed by sampling the total memory usage.
> > >
> > > I think the trendline is important to make right trade-offs in case
> > > setting logical_decoding_work_mem to maximum is not possible. With a
> > > high watermark the question is how long does a high remain high? As
> > > the load changes, a high that was once may become lower or higher than
> > > the new high and soon irrelevant. Probably we could reset the
> > > high-watermark every time the view is sampled, so that we provide a
> > > trendline for the high-water mark as well. But I want to be cautious
> > > about adding that complexity of tracking maxima accurately and then
> > > maintain it forever. If sampling is frequent enough usually it will
> > > capture a maxima and minima good enough for practical purposes. The
> > > users will need to consider the trendline as approximate anyway since
> > > the load will show slight variations over the time.
> >
> > Agreed with the importance of trendline.
> >
> > > Please share your idea of reporting high-watermark.
> >
> > I was thinking of this high-watermark idea since it doesn't require
> > any external system/tool to get the information for tuning
> > logical_decoding_work_mem. It would be easy to use when users want to
> > figure out maximum data usage for logical decoding of the particular
> > workload.
>
> Ok. I am interested in knowing how you think we should keep track of
> the high watermark.
>
> Do we keep it updated as new maxima are found? And keep reporting the
> last high watermark seen till a new maxima is reached? With that,
> users would end up setting logical_decoding_work_mem (and hence
> provisioning for it) at a higher value, even if the workload changes
> so that the reorder buffer never reaches that high watermark. So, I
> think we should reset the high watermark. Do you think so? If yes,
> When do we do that? How often do we reset it?

I think it ultimately depends on use cases but I imagine that users
can reset the high watermark value after adjusting
logical_decoding_work_mem, and see how the new value works, then
adjust the parameter and reset the value again, ... repeating.

> Tracking maxima and minima of other parameters in pg_stat_replication
> like replication lag might also help users e.g. to tune their
> networks. But we don't track their watermarks. External tools are used
> for that. I was envisioning something like that for reorder buffer
> size as well.

I think that high watermark value would be a new type of statistics we
collect. As far as I know pg_stat_statement collects similar values
but there is no precedent in the core.

>
> >
> > >
> > > > >
> > > > > 2. Is the size of the reorder buffer enough or we want to also track
> > > > > the size of changes on disk and the size of changes sent downstream as
> > > > > part of unfinished streamed transactions separately? Also the number
> > > > > of transactions being tracked by the reorder buffer?
> > > >
> > > > For the purpose of tuning logical_decoding_work_mem, the additional
> > > > metrics and statistics we need might not be many. But in order to make
> > > > logical decoding more visible for users for debugging or diagnosing
> > > > purposes, more statistics like the number of transactions being
> > > > tracked by the reorder buffer might be required.
> > > >
> > >
> > > Ok. We could add the total number of transactions in the reorder
> > > buffer at a given point in time to the report easily. How about
> > > subtransactions? How about prepared but not committed/aborted
> > > transactions?
> >
> > For debugging or diagnosing purposes, this information might be
> > useful, but I'm not sure we need this information of logical decodings
> > that are running.
>
> I was not clear. Sorry. I meant reporting the number of
> subtransactions and prepared transactions in the reorder buffer (not
> the actual subtransactions or prepared transactions themselves). Do
> you think that reporting just the number of transactions in the
> reorder buffer is enough? Do you think that reporting the number of
> subtransactions and the number of prepared transactions is not
> required?

Given that collecting new statistics and updating the statistics
real-time require costs, it's ultimately rewards vs. costs. In terms
of additional costs, I guess tracking the number of prepared
transactions in the reorderbuffer would not be costly, but I'm not
sure about the number of subtransactions. I personally would avoid
collecting these statistics unless there are explicit use cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Chao Li
Date: 23 September, 08:48:24
Subject: Re: SQL:2023 JSON simplified accessor support

From: shveta malik
Date: 23 September, 09:41:56
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance

Re: Report reorder buffer size - Mailing list pgsql-hackers

Previous

Next