Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date
Msg-id 20221116195232.wynvnlng2ivne76b@awork3.anarazel.de
Whole thread Raw
In response to Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
List pgsql-hackers
Hi,

On 2022-11-16 14:19:32 -0500, Robert Haas wrote:
> I have never really understood why we drive background writer or
> checkpointer statistics through the statistics collector.

To some degree it is required for durability - the stats system needs to know
how to write out those stats. But that wasn't ever a good reason to send
messages to the stats collector - it could just read the stats from shared
memory after all.

There's also integration with snapshots of the stats, resetting them, etc.

There's also the complexity that some of the stats e.g. for checkpointer
aren't about work the checkpointer did, but just have ended up there for
historical raisins. E.g. the number of fsyncs and writes done by backends.

See below:

> Here again, for things like table statistics, there is no choice, because we
> could have an unbounded number of tables and need to keep statistics about
> all of them. The statistics collector can handle that by allocating more
> memory as required. But there is only one background writer and only one
> checkpointer, so that is not needed in those cases. Why not just have them
> expose anything they want to expose through shared memory directly?

That's how it is in 15+. The memory for "fixed-numbered" or "global"
statistics are maintained by the stats system, but in plain shared memory,
allocated at server start. Not via the hash table.

Right now stats updates for the checkpointer use the "changecount" approach to
updates. For now that makes sense, because we update the stats only
occasionally (after a checkpoint or when writing in CheckpointWriteDelay()) -
a stats viewer seeing the checkpoint count go up, without yet seeing the
corresponding buffers written would be misleading.

I don't think we'd want every buffer write or whatnot go through the
changecount mechanism, on some non-x86 platforms that could be noticable. But
if we didn't stage the stats updates locally I think we could make most of the
stats changes without that overhead.  For updates that just increment a single
counter there's simply no benefit in the changecount mechanism afaict.

I didn't want to do that change during the initial shared memory stats work,
it already was bigger than I could handle...


It's not quite clear to me what the best path forward is for
buf_written_backend / buf_fsync_backend, which currently are reported via the
checkpointer stats. I think the best path might be to stop counting them via
the CheckpointerShmem->num_backend_writes etc and just populate the fields in
the view (for backward compat) via the proposed [1] pg_stat_io patch.  Doing
that accounting with CheckpointerCommLock held exclusively isn't free.



> If the statistics collector provides services that we care about, like
> persisting data across restarts or making snapshots for transactional
> behavior, then those might be reasons to go through it even for the
> background writer or checkpointer. But if so, we should be explicit
> about what the reasons are, both in the mailing list discussion and in
> code comments. Otherwise I fear that we'll just end up doing something
> in a more complicated way than is really necessary.

I tried to provide at least some of that in the comments at the start of
pgstat.c in 15+. There's very likely more that should be added, but I think
it's a decent start.

Greetings,

Andres Freund


[1] https://www.postgresql.org/message-id/CAOtHd0ApHna7_p6mvHoO%2BgLZdxjaQPRemg3_o0a4ytCPijLytQ%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: locale -a missing on Alpine Linux?
Next
From: Andrew Dunstan
Date:
Subject: Re: predefined role(s) for VACUUM and ANALYZE