Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CAH2-Wz=S-R_2rO49Hm94Nuvhu9_twRGbTm6uwDRmRu-Sqn_t3w@mail.gmail.com
Whole thread Raw
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
List pgsql-hackers
I think that we should decouple the PROC_VACUUM_FOR_WRAPAROUND
autocancellation behavior in ProcSleep() from antiwraparound
autovacuum itself. In other words I think that it should be possible
to cancel an autovacuum that happens to be an antiwraparound
autovacuum, just as if were any other autovacuum -- because it usually
is no different in any real practical sense. Or at least it shouldn't
be seen as fundamentally different to other autovacuums at first,
before relfrozenxid attains an appreciably greater age (definition of
"appreciably greater" is TBD).

Why should the PROC_VACUUM_FOR_WRAPAROUND behavior happen on *exactly*
the same timeline as the one used to launch an antiwraparound
autovacuum, though? There is no inherent reason why we have to do both
things at exactly the same XID-age-wise time. But there is reason to
think that doing so could make matters worse rather than better [1].

More generally I think that it'll be useful to perform "aggressive
behaviors" on their own timeline, with no two distinct aggressive
behaviors applied at exactly the same time. In general we ought to
give a less aggressive approach some room to succeed before escalating
to a more aggressive approach -- we should see if a less aggressive
approach will work on its own. The failsafe is the most aggressive
intervention of all. The PROC_VACUUM_FOR_WRAPAROUND behavior is almost
as aggressive, and should happen sooner. Antiwraparound autovacuum
itself (which is really a separate thing to
PROC_VACUUM_FOR_WRAPAROUND) is less aggressive still. Then you have
things like the cutoffs in vacuumlazy.c that control things like
freezing.

In short, having an "escalatory" approach that applies each behavior
at different times. The exact timelines we'd want are of course
debatable, but the value of having multiple distinct timelines (one
per aggressive behavior) is far less debatable. We should give
problems a chance to "resolve themselves", at least up to a point.

The latest version of my in progress VACUUM patch series [2]
completely removes the concept of aggressive VACUUM as a discrete mode
of operation inside vacuumlazy.c. Every existing "aggressive-ish
behavior" will be retained in some form or other, but they'll be
applied on separate timelines, in proportion to the problem at hand.
For example, we'll have a separate XID cutoff for waiting for a
cleanup lock the hard way -- we will no longer use FreezeLimit for
that, since that doesn't give freezing a chance to happen in the next
VACUUM. The same VACUUM operation that is the first one that is
capable of freezing should ideally not *also* be the first one that
has to wait for a cleanup lock. We should be willing to put off
waiting for a cleanup lock for much longer than we're willing to put
off freezing. Reusing the same cutoff just makes life harder.

Clearly the idea of decoupling the PROC_VACUUM_FOR_WRAPAROUND behavior
from antiwraparound autovacuum is conceptually related to my patch
series, but it can be treated as separate work. That's why I'm
starting another thread now.

There is another idea in that patch series that also seems worth
mentioning as relevant (but not essential) to this discussion on this
thread: it would be better if antiwraparound autovacuum was simply
another way to launch an autovacuum, which isn't fundamentally
different to any other. I believe that users will find this conceptual
model a lot easier, especially in a world where antiwraparound
autovacuums naturally became rare (which is the world that the big
patch series seeks to bring about). It'll make antiwraparound
autovacuum "the threshold of last resort", only needed when
conventional tuple-based thresholds don't trigger at all for an
extended period of time (e.g., for static tables).

Perhaps it won't be trivial to fix autovacuum.c in the way I have in
mind (which is to split PROC_VACUUM_FOR_WRAPAROUND into two flags that
serve two separate purposes). I haven't considered if we're
accidentally relying on the coupling to avoid confusion within
autovacuum.c. That doesn't seem important right now, though.

[1] https://www.tritondatacenter.com/blog/manta-postmortem-7-27-2015
[2] https://postgr.es/m/CAH2-WzkU42GzrsHhL2BiC1QMhaVGmVdb5HR0_qczz0Gu2aSn=A@mail.gmail.com
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: START_REPLICATION SLOT causing a crash in an assert build
Next
From: Corey Huinker
Date:
Subject: Re: ts_locale.c: why no t_isalnum() test?