Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |
Date | |
Msg-id | CAH2-Wzk3GShs96LdBU=raZiGtH1safUWvsQ2GskpDf8tLS4VAQ@mail.gmail.com Whole thread Raw |
In response to | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
|
List | pgsql-hackers |
On Fri, Jan 20, 2023 at 12:57 PM Robert Haas <robertmhaas@gmail.com> wrote: > It doesn't seem that way to me. What am I missing? In that case, the > problem was a DROP TRIGGER command waiting behind autovacuum's lock > and thus causing all new table locks to wait behind DROP TRIGGER's > lock request. But it does not sound like that was a one-off event. It's true that I cannot categorically state that it would have made the crucial difference in this particular case. It comes down to two factors: 1. How many attempts would any given amount of additional XID space head room have bought them in practice? We can be all but certain that the smallest possible number is 1, which is something. 2. Would that have been enough for relfrozenxid to be advanced in practice? I think that it's likely that the answer to 2 is yes, since there was no mention of bloat as a relevant factor at any point in the postmortem. It was all about locking characteristics of antiwraparound autovacuuming in particular, and its interaction with their application. I think that they were perfectly okay with the autovacuum cancellation behavior most of the time. In fact, I don't think that there was any bloat in the table at all -- it was a really huge table (likely an events table), and those tend to be append-only. Even if I'm wrong about this specific case (we'll never know for sure), the patch as written would be virtually guaranteed to make the crucial differences in cases that I have seen up close. For example, a case with TRUNCATE. > It sounds like they used DROP TRIGGER pretty regularly. So I think this > sounds like exactly the kind of case I was talking about, where > autovacuums keep getting cancelled until we decide to stop cancelling > them. I don't know how you can reach that conclusion. The chances of a non-aggressive VACUUM advancing relfrozenxid right now are virtually zero, at least for a big table like this one. It seems quite likely that plenty of non-aggressive autovacuums completed, or would have had the insert-driven autovacuum feature been available. The whole article was about how this DROP TRIGGER pattern worked just fine most of the time, because most of the time autovacuum was just autocancelled. They say this at one point: "The normal autovacuum mechanism is skipped when locks are held in order to minimize service disruption. However, because transaction wraparound is such a severe problem, if the system gets too close to wraparound, an autovacuum is launched that does not back off under lock contention." At another point: "When the outage was resolved, we still had a number of questions: is a wraparound autovacuum always so disruptive? Given that it was blocking all table operations, why does it throttle itself?" ISTM that it was a combination of aggressive vacuuming taking far longer than usual (especially likely because this was pre freeze map), and the no-auto-cancel behavior. Aggressive/antiwraparound VACUUMs are naturally much more likely to coincide with periodic DDL, just because they take so much longer. That is a dangerous combination. -- Peter Geoghegan
pgsql-hackers by date: