Re: Flush pgstats file during checkpoints - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Flush pgstats file during checkpoints
Date
Msg-id 41fb6458-7405-4b72-8226-b1d7f976dee5@enterprisedb.com
Whole thread Raw
In response to Re: Flush pgstats file during checkpoints  (Konstantin Knizhnik <knizhnik@garret.ru>)
List pgsql-hackers

On 6/28/24 09:32, Konstantin Knizhnik wrote:
> 
> On 18/06/2024 9:01 am, Michael Paquier wrote:
>> Hi all,
>>
>> On HEAD, xlog.c has the following comment, which has been on my own
>> TODO list for a couple of weeks now:
>>
>>       * TODO: With a bit of extra work we could just start with a
>> pgstat file
>>       * associated with the checkpoint redo location we're starting from.
>>
>> Please find a patch series to implement that, giving the possibility
>> to keep statistics after a crash rather than discard them.  I have
>> been looking at the code for a while, before settling down on:
>> - Forcing the flush of the pgstats file to happen during non-shutdown
>> checkpoint and restart points, after updating the control file's redo
>> LSN and the critical sections in the area.
>> - Leaving the current before_shmem_exit() callback around, as a matter
>> of delaying the flush of the stats for as long as possible in a
>> shutdown sequence.  This also makes the single-user mode shutdown
>> simpler.
>> - Adding the redo LSN to the pgstats file, with a bump of
>> PGSTAT_FILE_FORMAT_ID, cross-checked with the redo LSN.  This change
>> is independently useful on its own when loading stats after a clean
>> startup, as well.
>> - The crash recovery case is simplified, as there is no more need for
>> the "discard" code path.
>> - Not using a logic where I would need to stick a LSN into the stats
>> file name, implying that we would need a potential lookup at the
>> contents of pg_stat/ to clean up past files at crash recovery.  These
>> should not be costly, but I'd rather not add more of these.
>>
>> 7ff23c6d277d, that has removed the last call of CreateCheckPoint()
>> from the startup process, is older than 5891c7a8ed8f, still it seems
>> to me that pgstats relies on some areas of the code that don't make
>> sense on HEAD (see locking mentioned at the top of the write routine
>> for example).  The logic gets straight-forward to think about as
>> restart points and checkpoints always run from the checkpointer,
>> implying that pgstat_write_statsfile() is already called only from the
>> postmaster in single-user mode or the checkpointer itself, at
>> shutdown.
>>
>> Attached is a patch set, with the one being the actual feature, with
>> some stuff prior to that:
>> - 0001 adds the redo LSN to the pgstats file flushed.
>> - 0002 adds an assertion in pgstat_write_statsfile(), to check from
>> where it is called.
>> - 0003 with more debugging.
>> - 0004 is the meat of the thread.
>>
>> I am adding that to the next CF.  Thoughts and comments are welcome.
>> Thanks,
>> -- 
>> Michael
> 
> Hi Michael.
> 
> I am working mostly on the same problem - persisting pgstat state in
> Neon (because of separation of storage and compute it has no local files).
> I have two questions concerning this PR and the whole strategy for
> saving pgstat state between sessions.
> 
> 1. Right now pgstat.stat is discarded after abnormal Postgres
> termination. And in your PR we are storing LSN in pgstat.staty to check
> that it matches checkpoint redo LSN. I wonder if having outdated version
> of pgstat.stat  is worse than not having it at all? Comment in xlog.c
> says: "When starting with crash recovery, reset pgstat data - it might
> not be valid." But why it may be invalid? We are writing it first to
> temp file and then rename. May be fsync() should be added here and
> durable_rename() should be used instead of rename(). But it seems to be
> better than loosing statistics. And should not add some significant
> overhead (because it happens only at shutdown). In your case we are
> checking LSN of pgstat.stat file. But once again - why it is better to
> discard file than load version from previous checkpoint?
> 

I think those are two independent issues - knowing that the snapshot is
from the last checkpoint, and knowing that it's correct (not corrupted).
And yeah, we should be careful about fsync/durable_rename.

> 2. Second question is also related with 1). So we saved pgstat.stat on
> checkpoint, then did some updates and then Postgres crashed. As far as I
> understand with your patch we will successfully restore pgstats.stat
> file. But it is not actually up-to-date: it doesn't reflect information
> about recent updates. If it was so critical to keep pgstat.stat
> up-to-date, then why do we allow to restore state on most recent
> checkpoint?
> 

Yeah, I was wondering about the same thing - can't this mean we fail to
start autovacuum? Let's say we delete a significant fraction of a huge
table, but then we crash before the next checkpoint. With this patch we
restore the last stats snapshot, which can easily have

n_dead_tup          | 0
n_mod_since_analyze | 0

for the table. And thus we fail to notice the table needs autovacuum.
AFAIK we run autovacuum on all tables with missing stats (or am I
wrong?). That's what's happening on replicas after switchover/failover
too, right?

It'd not be such an issue if we updated stats during recovery, but I
think think we're doing that. Perhaps we should, which might also help
on replicas - no idea if it's feasible, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Add more SQL/JSON constructor functions
Next
From: Noah Misch
Date:
Subject: Re: Built-in CTYPE provider