Re: Report reorder buffer size - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Report reorder buffer size
Date
Msg-id aKxd6pEOW8SZJ2TT@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Report reorder buffer size  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-hackers
Hi,

On Mon, Aug 25, 2025 at 01:48:42PM +0530, Ashutosh Bapat wrote:
> On Fri, Aug 22, 2025 at 11:43 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Yes as long as subtrans are not involved. But if say I've one transaction made of
> > 1500 subtrans, I'd get something like:
> >
> >   slot_name   | spill_count | stream_count | total_txns
> > --------------+-------------+--------------+------------
> >  logical_slot |        1501 |            0 |          1
> >
> > So we don't know how many times logical_decoding_work_mem has been reached (
> > except by looking at total_txns).
> >
> > But as soon as another transaction (that does not involve spill) is decoded:
> >
> >   slot_name   | spill_count | stream_count | total_txns
> > --------------+-------------+--------------+------------
> >  logical_slot |        1501 |            0 |          2
> >
> > Then we don't know if logical_decoding_work_mem has been reached one or two
> > times.
> 
> I didn't know this is how it works. Thanks for correcting me. In that
> case, I think we should
> add a column in pg_stat_replication_slots reporting the number of
> times the memory limit is reached since the last reset. I am +0.5 on
> it being useful.

Thanks. With this in place one could get a ratio like total_txns/total_txns_that_exceeds.
That could help to see if reaching logical_decoding_work_mem is rare or 
frequent enough. If frequent, then maybe there is a need to adjust logical_decoding_work_mem.

> > That's right as long as the system is still available. I think a mix of log
> > and SQL API help more use cases: real time, historical and when the system is
> > back to a usable state (if it was not anymore).
> 
> The way I envision this is that users will sample the views
> periodically or when certain thresholds (amount of WAL, size of
> reorder buffers) are crossed. That way they can have historical data
> in an easy-to-query manner when the actual incident occurs.

Yeah. OTOH, having this information in the log could also help users that
did not think about sampling the views and hit an incident.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Amul Sul
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Next
From: Tomas Vondra
Date:
Subject: Re: index prefetching