Hi,
On Fri, Jul 05, 2024 at 01:52:31PM +0900, Michael Paquier wrote:
> On Sat, Jun 29, 2024 at 11:13:04PM +0200, Tomas Vondra wrote:
> > I think those are two independent issues - knowing that the snapshot is
> > from the last checkpoint, and knowing that it's correct (not corrupted).
> > And yeah, we should be careful about fsync/durable_rename.
>
> Yeah, that's bugging me as well. I don't really get why we would not
> want durability at shutdown for this data. So how about switching the
> end of pgstat_write_statsfile() to use durable_rename()? Sounds like
> an independent change, worth on its own.
>
> > Yeah, I was wondering about the same thing - can't this mean we fail to
> > start autovacuum? Let's say we delete a significant fraction of a huge
> > table, but then we crash before the next checkpoint. With this patch we
> > restore the last stats snapshot, which can easily have
> >
> > n_dead_tup | 0
> > n_mod_since_analyze | 0
> >
> > for the table. And thus we fail to notice the table needs autovacuum.
> > AFAIK we run autovacuum on all tables with missing stats (or am I
> > wrong?). That's what's happening on replicas after switchover/failover
> > too, right?
>
> That's the opposite, please look at relation_needs_vacanalyze(). If a
> table does not have any stats found in pgstats, like on HEAD after a
> crash, then autoanalyze is skipped and autovacuum happens only for the
> anti-wrap case.
>
> For the database case, rebuild_database_list() uses
> pgstat_fetch_stat_dbentry() three times, discards entirely databases
> that have no stats. Again, there should be a stats entry post a
> crash upon a reconnection.
>
> So there's an argument in saying that the situation does not get worse
> here and that we actually may improve odds by keeping a trace of the
> previous stats, no?
I agree, we still could get autoanalyze/autovacuum skipped due to obsolete/stales
stats being restored from the last checkpoint but that's better than having them
always skipped (current HEAD).
> In most cases, there would be a stats entry
> post-crash as an INSERT or a scan would have re-created it,
Yeap.
> but the
> stats would reflect the state registered since the last crash recovery
> (even on HEAD, a bulk delete bumping the dead tuple counter would not
> be detected post-crash).
Right.
> The case of spiky workloads may impact the
> decision-making, of course, but at least we'd allow autovacuum to take
> some decision over giving up entirely based on some previous state of
> the stats. That sounds like a win for me with steady workloads, less
> for bulby workloads..
I agree and it is not worst (though not ideal) that currently on HEAD.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com