Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CA+TgmoYaDPzwbHrrXmVCOR_bOcLBm4sL5NErS=nNOQZ3nm3uCQ@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
List pgsql-hackers
On Mon, Jan 9, 2023 at 8:40 PM Peter Geoghegan <pg@bowt.ie> wrote:
> That's not what the patch does. It doubles the time that the anti-wrap
> no-autocancellation behaviors kick in, up to a maximum of 1 billion
> XIDs/MXIDs. So it goes from autovacuum_freeze_max_age to
> autovacuum_freeze_max_age x 2, without changing the basic fact that we
> initially launch autovacuums that advance relfrozenxid/relminmxid when
> the autovacuum_freeze_max_age threshold is first crossed.

I'm skeptical about this kind of approach.

I do agree that it's good to slowly increase the aggressiveness of
VACUUM as we get further behind, rather than having big behavior
changes all at once, but I think that should happen by smoothly
varying various parameters rather than by making discrete behavior
changes at a whole bunch of different times. For instance, when VACUUM
goes into emergency mode, it stops respecting the vacuum delay. I
think that's great, but it happens all at once, and maybe it would be
better if it didn't. We could consider gradually ramping the vacuum
delay from 100% down to 0% instead of having it happen all at once.
Maybe that's not the right idea, I don't know, and a naive
implementation might be worse than nothing, but I think it has some
chance of being worth consideration.

But what the kind of change you're proposing here does is create
another threshold where the behavior changes suddenly, and I think
that's challenging from the point of view of understanding the
behavior of the system. The behavior already changes when you hit
vacuum_freeze_min_age and then again when you hit
vacuum_freeze_table_age and then there's also
autoovacuum_freeze_max_age and xidWarnLimit and xidStopLimit and a few
others, and these setting all interact in pretty complex ways. The
more conditional logic we add to that, the harder it becomes to
understand what's actually happening. You see a system where
age(relfrozenxid) = 673m and you need a calculator and a spreadsheet
to figure out what the vacuum behavior is at that point. Honestly, I
think we already have a problem with the behaviors here being too
complex for normal human beings to understand them, and I think that
the kinds of changes you are proposing here could make that quite a
bit worse.

Now, you might reply to the above by saying, well, some behaviors
can't vary continuously. vacuum_cost_limit can perhaps be phased out
gradually, but autocancellation seems like something that you must
either do, or not do. I would agree with that. But what I'm saying is
that we ought to favor having those kinds of behaviors all engage at
the same point rather than at different times. I'm not saying that
there can't ever be good reasons to separate out different behaviors
and have the engage at different times, but I think we will end up
better off if we minimize that sort of thing as much as we reasonably
can. In your opening email you write "Why should the
PROC_VACUUM_FOR_WRAPAROUND behavior happen on *exactly* the same
timeline as the one used to launch an antiwraparound autovacuum,
though?" and my answer is "because that's easier to understand and I
don't see that it has much of a downside."

I did take a look at the post-mortem to which you linked, but I am not
quite sure how that bears on the behavior change under discussion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Isaac Morland
Date:
Subject: Re: Remove source code display from \df+?
Next
From: Alvaro Herrera
Date:
Subject: Re: Remove nonmeaningful prefixes in PgStat_* fields