Hi,
On 2025-02-26 15:37:10 +0900, Michael Paquier wrote:
> That's bad, worse for a logical WAL sender, because it means that we
> have no idea what kind of I/O happens in this process until it exits,
> and logical WAL senders could loop forever, since v16 where we've
> begun tracking I/O.
FWIW, I think medium term we need to work on splitting stats flushing into two
separate kinds of flushes:
1) non-transactional stats, which should be flushed at a regular interval,
unless a process is completely idle
2) transaction stats, which can only be flushed at transaction boundaries,
because before the transaction boundary we don't know if e.g. newly
inserted rows should be counted as live or dead
So far we have some timer logic for 2), but we have basically no support for
1). Which means we have weird ad-hoc logic in various kinds of
non-plain-connection processes. And that will often have holes, as Bertrand
noticed here.
I think it's also bad that we don't have a solution for 1), even just for
normal connections. If a backend causes a lot of IO we might want to know
about that long before the longrunning transaction commits.
I suspect the right design here would be to have a generalized form of the
timeout mechanism we have for 2).
For that we'd need to make sure that pgstat_report_stat() can be safely called
inside a transaction. The second part would be to redesign the
IdleStatsUpdateTimeoutPending mechanism so it is triggered independent of
idleness, without introducing unacceptable overhead - I think that's doable.
Greetings,
Andres Freund