Re: Flush pgstats file during checkpoints - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Flush pgstats file during checkpoints
Date
Msg-id ZpEE2DIUkRDe3JbP@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Flush pgstats file during checkpoints  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

On Fri, Jul 05, 2024 at 01:52:31PM +0900, Michael Paquier wrote:
> On Sat, Jun 29, 2024 at 11:13:04PM +0200, Tomas Vondra wrote:
> > I think those are two independent issues - knowing that the snapshot is
> > from the last checkpoint, and knowing that it's correct (not corrupted).
> > And yeah, we should be careful about fsync/durable_rename.
> 
> Yeah, that's bugging me as well.  I don't really get why we would not
> want durability at shutdown for this data.  So how about switching the
> end of pgstat_write_statsfile() to use durable_rename()?  Sounds like
> an independent change, worth on its own.
> 
> > Yeah, I was wondering about the same thing - can't this mean we fail to
> > start autovacuum? Let's say we delete a significant fraction of a huge
> > table, but then we crash before the next checkpoint. With this patch we
> > restore the last stats snapshot, which can easily have
> > 
> > n_dead_tup          | 0
> > n_mod_since_analyze | 0
> > 
> > for the table. And thus we fail to notice the table needs autovacuum.
> > AFAIK we run autovacuum on all tables with missing stats (or am I
> > wrong?). That's what's happening on replicas after switchover/failover
> > too, right?
> 
> That's the opposite, please look at relation_needs_vacanalyze().  If a
> table does not have any stats found in pgstats, like on HEAD after a
> crash, then autoanalyze is skipped and autovacuum happens only for the
> anti-wrap case.
> 
> For the database case, rebuild_database_list() uses
> pgstat_fetch_stat_dbentry() three times, discards entirely databases
> that have no stats.  Again, there should be a stats entry post a
> crash upon a reconnection.
> 
> So there's an argument in saying that the situation does not get worse
> here and that we actually may improve odds by keeping a trace of the
> previous stats, no? 

I agree, we still could get autoanalyze/autovacuum skipped due to obsolete/stales
stats being restored from the last checkpoint but that's better than having them
always skipped (current HEAD).

> In most cases, there would be a stats entry
> post-crash as an INSERT or a scan would have re-created it,

Yeap.

> but the
> stats would reflect the state registered since the last crash recovery
> (even on HEAD, a bulk delete bumping the dead tuple counter would not
> be detected post-crash).

Right.

> The case of spiky workloads may impact the 
> decision-making, of course, but at least we'd allow autovacuum to take
> some decision over giving up entirely based on some previous state of
> the stats.  That sounds like a win for me with steady workloads, less
> for bulby workloads..

I agree and it is not worst (though not ideal) that currently on HEAD.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Junwang Zhao
Date:
Subject: Re: MERGE/SPLIT partition commands should create new partitions in the parent's tablespace?
Next
From: Alexander Korotkov
Date:
Subject: Re: Removing unneeded self joins