Re: [BUG]: the walsender does not update its IO statistics until it exits - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: [BUG]: the walsender does not update its IO statistics until it exits
Date
Msg-id Z7/9feHcUMcloHaK@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: [BUG]: the walsender does not update its IO statistics until it exits  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On Wed, Feb 26, 2025 at 05:08:17AM -0500, Andres Freund wrote:
> Hi,
> 
> On 2025-02-26 15:37:10 +0900, Michael Paquier wrote:
> > That's bad, worse for a logical WAL sender, because it means that we
> > have no idea what kind of I/O happens in this process until it exits,
> > and logical WAL senders could loop forever, since v16 where we've
> > begun tracking I/O.
> 
> FWIW, I think medium term we need to work on splitting stats flushing into two
> separate kinds of flushes:
> 1) non-transactional stats, which should be flushed at a regular interval,
>    unless a process is completely idle
> 2) transaction stats, which can only be flushed at transaction boundaries,
>    because before the transaction boundary we don't know if e.g. newly
>    inserted rows should be counted as live or dead
> 
> So far we have some timer logic for 2), but we have basically no support for
> 1). Which means we have weird ad-hoc logic in various kinds of
> non-plain-connection processes. And that will often have holes, as Bertrand
> noticed here.

Thanks for sharing your thoughts on it!

Yeah, agree that's a good medium term idea to avoid missing flushing some
stats.

> I think it's also bad that we don't have a solution for 1), even just for
> normal connections. If a backend causes a lot of IO we might want to know
> about that long before the longrunning transaction commits.

+++1 There is no need to wait for the transaction boundary for some stats as
those would be more "useful"/"actionable" if flushed during the transaction is
in progress.

> I suspect the right design here would be to have a generalized form of the
> timeout mechanism we have for 2).
> 
> For that we'd need to make sure that pgstat_report_stat() can be safely called
> inside a transaction.  The second part would be to redesign the
> IdleStatsUpdateTimeoutPending mechanism so it is triggered independent of
> idleness, without introducing unacceptable overhead - I think that's doable.

Adding this to my bucket unless someone beats me on it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Small memory fixes for pg_createsubcriber
Next
From: Maxim Orlov
Date:
Subject: Re: Spinlock can be released twice in procsignal.c