Re: Flush pgstats file during checkpoints - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Flush pgstats file during checkpoints |
Date | |
Msg-id | 41fb6458-7405-4b72-8226-b1d7f976dee5@enterprisedb.com Whole thread Raw |
In response to | Re: Flush pgstats file during checkpoints (Konstantin Knizhnik <knizhnik@garret.ru>) |
Responses |
Re: Flush pgstats file during checkpoints
|
List | pgsql-hackers |
On 6/28/24 09:32, Konstantin Knizhnik wrote: > > On 18/06/2024 9:01 am, Michael Paquier wrote: >> Hi all, >> >> On HEAD, xlog.c has the following comment, which has been on my own >> TODO list for a couple of weeks now: >> >> * TODO: With a bit of extra work we could just start with a >> pgstat file >> * associated with the checkpoint redo location we're starting from. >> >> Please find a patch series to implement that, giving the possibility >> to keep statistics after a crash rather than discard them. I have >> been looking at the code for a while, before settling down on: >> - Forcing the flush of the pgstats file to happen during non-shutdown >> checkpoint and restart points, after updating the control file's redo >> LSN and the critical sections in the area. >> - Leaving the current before_shmem_exit() callback around, as a matter >> of delaying the flush of the stats for as long as possible in a >> shutdown sequence. This also makes the single-user mode shutdown >> simpler. >> - Adding the redo LSN to the pgstats file, with a bump of >> PGSTAT_FILE_FORMAT_ID, cross-checked with the redo LSN. This change >> is independently useful on its own when loading stats after a clean >> startup, as well. >> - The crash recovery case is simplified, as there is no more need for >> the "discard" code path. >> - Not using a logic where I would need to stick a LSN into the stats >> file name, implying that we would need a potential lookup at the >> contents of pg_stat/ to clean up past files at crash recovery. These >> should not be costly, but I'd rather not add more of these. >> >> 7ff23c6d277d, that has removed the last call of CreateCheckPoint() >> from the startup process, is older than 5891c7a8ed8f, still it seems >> to me that pgstats relies on some areas of the code that don't make >> sense on HEAD (see locking mentioned at the top of the write routine >> for example). The logic gets straight-forward to think about as >> restart points and checkpoints always run from the checkpointer, >> implying that pgstat_write_statsfile() is already called only from the >> postmaster in single-user mode or the checkpointer itself, at >> shutdown. >> >> Attached is a patch set, with the one being the actual feature, with >> some stuff prior to that: >> - 0001 adds the redo LSN to the pgstats file flushed. >> - 0002 adds an assertion in pgstat_write_statsfile(), to check from >> where it is called. >> - 0003 with more debugging. >> - 0004 is the meat of the thread. >> >> I am adding that to the next CF. Thoughts and comments are welcome. >> Thanks, >> -- >> Michael > > Hi Michael. > > I am working mostly on the same problem - persisting pgstat state in > Neon (because of separation of storage and compute it has no local files). > I have two questions concerning this PR and the whole strategy for > saving pgstat state between sessions. > > 1. Right now pgstat.stat is discarded after abnormal Postgres > termination. And in your PR we are storing LSN in pgstat.staty to check > that it matches checkpoint redo LSN. I wonder if having outdated version > of pgstat.stat is worse than not having it at all? Comment in xlog.c > says: "When starting with crash recovery, reset pgstat data - it might > not be valid." But why it may be invalid? We are writing it first to > temp file and then rename. May be fsync() should be added here and > durable_rename() should be used instead of rename(). But it seems to be > better than loosing statistics. And should not add some significant > overhead (because it happens only at shutdown). In your case we are > checking LSN of pgstat.stat file. But once again - why it is better to > discard file than load version from previous checkpoint? > I think those are two independent issues - knowing that the snapshot is from the last checkpoint, and knowing that it's correct (not corrupted). And yeah, we should be careful about fsync/durable_rename. > 2. Second question is also related with 1). So we saved pgstat.stat on > checkpoint, then did some updates and then Postgres crashed. As far as I > understand with your patch we will successfully restore pgstats.stat > file. But it is not actually up-to-date: it doesn't reflect information > about recent updates. If it was so critical to keep pgstat.stat > up-to-date, then why do we allow to restore state on most recent > checkpoint? > Yeah, I was wondering about the same thing - can't this mean we fail to start autovacuum? Let's say we delete a significant fraction of a huge table, but then we crash before the next checkpoint. With this patch we restore the last stats snapshot, which can easily have n_dead_tup | 0 n_mod_since_analyze | 0 for the table. And thus we fail to notice the table needs autovacuum. AFAIK we run autovacuum on all tables with missing stats (or am I wrong?). That's what's happening on replicas after switchover/failover too, right? It'd not be such an issue if we updated stats during recovery, but I think think we're doing that. Perhaps we should, which might also help on replicas - no idea if it's feasible, though. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: