Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CA+Tgmobc68gBub=yRtksdD6W+_p6SQza8VqOJWtZGLpL366XOA@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Andres Freund <andres@anarazel.de>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
List pgsql-hackers
On Fri, Jan 13, 2023 at 9:09 PM Andres Freund <andres@anarazel.de> wrote:
> > > If I understand the patch correctly, we now have the following age based
> > > thresholds for av:
> > >
> > > - force-enable autovacuum:
> > >   oldest_datfrozenxid + autovacuum_freeze_max_age < nextXid
> > > - autovacuum based on age:
> > >   freeze_max_age = Min(autovacuum_freeze_max_age, table_freeze_max_age)
> > >     tableagevac = relfrozenxid < recentXid - freeze_max_age
> > > - prevent auto-cancellation:
> > >   freeze_max_age = Min(autovacuum_freeze_max_age, table_freeze_max_age)
> > >   prevent_auto_cancel_age = Min(freeze_max_age * 2, 1 billion)
> > >   prevent_auto_cancel = reflrozenxid < recentXid - prevent_auto_cancel_age
> > >
> > > Is that right?
> >
> > That summary looks accurate, but I'm a bit confused about why you're
> > asking the question this way. I thought that it was obvious that the
> > patch doesn't change most of these things.
>
> For me it was helpful to clearly list the triggers when thinking about the
> issue. I found the diff hard to read and, as noted above, the logic for the
> auto cancel threshold quite confusing, so ...

I really dislike formulas like Min(freeze_max_age * 2, 1 billion).
That looks completely magical from a user perspective. Some users
aren't going to understand autovacuum behavior at all. Some will, and
will be able to compare age(relfrozenxid) against
autovacuum_freeze_max_age. Very few people are going to think to
compare age(relfrozenxid) against some formula based on
autovacuum_freeze_max_age. I guess if we document it, maybe they will.

But even then, what's the logic behind that formula? I am not entirely
convinced that we need to separate the force-a-vacuum threshold from
the don't-cancel threshold, but if we do separate them, what's the
purpose of having the clearance between them increase as you increase
autovacuum_freeze_max_age from 0 to 500 million, and thereafter
decrease until it reaches 0 at 1 billion? I can't explain the logic
behind that except by saying "well, somebody came up with an arbitrary
formula".

I do like the idea of driving the auto-cancel behavior off of the
results of previous attempts to vacuum the table. That could be done
independently of the XID age of the table. If we've failed to vacuum
the table, say, 10 times, because we kept auto-cancelling, it's
probably appropriate to force the issue. It doesn't really matter
whether the autovacuum triggered because of bloat or because of XID
age. Letting either of those things get out of control is bad. What I
think happens fairly commonly right now is that the vacuums just keep
getting cancelled until the table's XID age gets too old, and then we
finally force the issue. But at that point a lot of harm has already
been done. In a frequently updated table, waiting 300 million XIDs to
stop cancelling the vacuum is basically condemning the user to have to
run VACUUM FULL. The table can easily be ten or a hundred times bigger
than it should be by that point.

And that's a big reason why I am skeptical about the patch as
proposed. It raises the threshold for auto-cancellation in cases where
it's sometimes already far too high.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pgsql: Add new GUC createrole_self_grant.
Next
From: Ashutosh Bapat
Date:
Subject: Re: Logical replication timeout problem