Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CA+TgmobK-RekJymknuNtnYR8+FtAKwZg0Q=7Z6Fs8xa+fnddjg@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Fri, Jan 20, 2023 at 3:43 PM Peter Geoghegan <pg@bowt.ie> wrote:
> The only reason why I'm using table age at all is because that's how
> it works already, rightly or wrongly. If nothing else, t's pretty
> clear that there is no theoretical or practical reason why it has to
> be exactly the same table age as the one for launching autovacuums to
> advance relfrozenxid/relminmxid. In v5 of the patch, the default is to
> use 1.8x of the threshold that initially makes autovacuum.c want to
> launch autovacuums to deal with table age. That's based on a
> suggestion from Andres, but I'd be almost as happy with a default as
> low as 1.1x or even 1.05x. That's going to make very little difference
> to those users that really rely on the no-auto-cancellation behavior,
> while at the same time making things a lot safer for scenarios like
> the Joyent/Manta "DROP TRIGGER" outage (not perfectly safe, by any
> means, but meaningfully safer).

It doesn't seem that way to me. What am I missing? In that case, the
problem was a DROP TRIGGER command waiting behind autovacuum's lock
and thus causing all new table locks to wait behind DROP TRIGGER's
lock request. But it does not sound like that was a one-off event. It
sounds like they used DROP TRIGGER pretty regularly. So I think this
sounds like exactly the kind of case I was talking about, where
autovacuums keep getting cancelled until we decide to stop cancelling
them.

If so, then they were going to have a problem whenever that happened.
Delaying the point at which we stop cancelling them would not help at
all, as your patch would do. What about stopping cancelling them
sooner, as with the proposal to switch to that behavior after a
certain number of auto-cancels? That doesn't prevent the problem
either. If it's aggressive enough, it has some chance of making the
problem visible in a well-run test environment, which could
conceivably prevent you from hitting it in production, but certainly
there are no guarantees.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
Next
From: Nathan Bossart
Date:
Subject: Re: almost-super-user problems that we haven't fixed yet