Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CAH2-Wz=AfJKAwRq-PV-3WTU3mg-RRfnbMEdga0CBmwAXRmebcQ@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
List pgsql-hackers
On Thu, Oct 20, 2022 at 11:09 AM Jeff Davis <pgsql@j-davis.com> wrote:
> The terminology is getting slightly confusing here: by
> "antiwraparound", you mean that it's not skipping unfrozen pages, and
> therefore is able to advance relfrozenxid. Whereas the
> PROC_VACUUM_FOR_WRAPAROUND is the same thing, except done with greater
> urgency because wraparound is imminent. Right?

Not really.

I started this thread to discuss a behavior in autovacuum.c and proc.c
(the autocancellation behavior), which is, strictly speaking, not
related to the current vacuumlazy.c behavior we call aggressive mode
VACUUM. Various hackers have in the past described antiwraparound
autovacuum as "implying aggressive", which makes sense; what's the
point in doing an antiwraparound autovacuum that can almost never
advance relfrozenxid?

It is nevertheless true that antiwraparound autovacuum is an
independent behavior to aggressive VACUUM. The former is an autovacuum
thing, and the latter is a VACUUM thing. That's just how it works,
mechanically.

If this division seems artificial or pedantic to you, then consider
the fact that you can quite easily get a non-aggressive antiwraparound
autovacuum by using the storage option called
autovacuum_freeze_max_age (instead of the GUC):

https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=deg@mail.gmail.com

This is even a case where we'll output a distinct description in the
server log when autovacuum logging is enabled and gets triggered. So
while there may be no point in an antiwraparound autovacuum that is
non-aggressive, that doesn't stop them from happening. Regardless of
whether or not that's an intended behavior, that's just how the
mechanism has been constructed.

> > There is no inherent reason why we have to do both
> > things at exactly the same XID-age-wise time. But there is reason to
> > think that doing so could make matters worse rather than better [1].
>
> Can you explain?

Why should the special autocancellation behavior for antiwraparound
autovacuums kick in at exactly the same point that we first launch an
antiwraparound autovacuum? Maybe that aggressive intervention will be
needed, in the end, but why start there?

With my patch series, antiwraparound autovacuums still occur, but
they're confined to things like static tables -- things that are
pretty much edge cases. They still need to behave sensibly (i.e.
reliably advance relfrozenxid based on some principled approach), but
now they're more like "an autovacuum that happens because no other
condition triggered an autovacuum". To some degree this is already the
case, but I'd like to be more deliberate about it.

Leaving my patch series aside, I still don't think that it makes sense
to make it impossible to auto-cancel antiwraparound autovacuums,
across the board, regardless of the underlying table age. We still
need something like that, but why not give a still-cancellable
autovacuum worker a chance to resolve the problem? Why take a risk of
causing much bigger problems (e.g., blocking automated DDL that blocks
simple SELECT queries) before the point that that starts to look like
the lesser risk (compared to hitting xidStopLimit)?

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Avoid memory leaks during base backups
Next
From: Andrew Dunstan
Date:
Subject: Re: cross-platform pg_basebackup