Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CAH2-WzmL=q6tUrD6_EXi70yHq+gtKaFt6D73N=scJpwkwzD8iA@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Sun, Oct 23, 2022 at 9:32 PM Jeff Davis <pgsql@j-davis.com> wrote:
> It's possible this would be easier for users to understand: one process
> that does cleanup work over time in a way that minimizes interference;
> and another process that activates in more urgent situations (perhaps
> due to misconfiguration of the first process).

I think that the new "early" version of antiwraparound autovacuum
(that can still be autocancelled) would simply be called autovacuum.
It wouldn't appear as "autovacuum to prevent wraparound" in places
like pg_stat_activity. For the most part users wouldn't have to care
about the difference between these autovacuums and traditional
non-antiwraparound autovacuums. They really would be exactly the same
thing, so it would make sense if users typically noticed no difference
whatsoever (at least in contexts like pg_stat_activity).

> But we should be careful that we don't end up with more confusion. For
> something like that to work, we'd probably want the second process to
> not be configurable at all, and we'd want it to be issuing WARNINGs
> pointing to what might be misconfigured, and otherwise just be
> invisible.

There should be some simple scheme for determining when an
antiwraparound autovacuum (non-cancellable autovacuum to advance
relfrozenxid/relminmxid) should run (applied by the autovacuum.c
scheduling logic). Something like "table has attained an age that's
now 2x autovacuum_freeze_max_age, or 1/2 of vacuum_failsafe_age,
whichever is less".

The really important thing is giving a regular/early autocancellable
autovacuum triggered by age(relfrozenxid) *some* opportunity to run. I
strongly suspect that the exact details won't matter too much,
provided we manage to launch at least one such autovacuum before
escalating to traditional antiwraparound autovacuum (which cannot be
autocancelled). Even if regular/early autovacuum had just one
opportunity to run to completion, we'd already be much better off. The
hazards from blocking automated DDL in a way that leads to a very
disruptive traffic jam (like in the Joyent Manta postmortem) would go
way down.

> >  That way we wouldn't be fighting against the widely held perception
> > that antiwraparound autovacuums are scary.
>
> There's certainly a terminology problem there. Just to brainstorm on
> some new names, we might want to call it something like "xid
> reclamation" or "xid horizon advancement".

I think that we should simply call it autovacuum. Under this scheme,
antiwraparound autovacuum would be a qualitatively different kind of
operation to users (though not to vacuumlazy.c), because it would not
be autocancellable in the standard way. And because users should take
it as a signal that things aren't really working well (otherwise we
wouldn't have reached the point of requiring a scary antiwraparound
autovacuum in the first place). Right now antiwraparound autovacuums
are both an emergency thing (or at least described as such in one or
two areas of the source code), and a completely routine occurrence.
This is deeply confusing.

Separately, I plan on breaking out insert-triggered autovacuums from
traditional dead tuple triggered autovacuums [1], which creates a need
to invent some kind of name to differentiate the new table age
triggering criteria from both insert-driven and dead tuple driven
autovacuums. These are all fundamentally the same operations with the
same urgency to users, though. We'd only need to describe the
*criteria* that *triggered* the autovacuum in our autovacuum log
report (actually we'd still report autovacuums aš antiwraparound
autovacuum in cases where that still happened, which won't be
presented as just another triggering criteria in the report).

[1] https://www.postgresql.org/message-id/flat/CAH2-WznEqmkmry8feuDK8XdpH37-4anyGF7a04bWXOc1GKd0Yg@mail.gmail.com
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Pluggable toaster
Next
From: "Finnerty, Jim"
Date:
Subject: Re: parse partition strategy string in gram.y