Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Date
Msg-id CAH2-Wznc2q_yhu0KNYwxfo1mdA4xHCm1aw4rLij50nLkhcRjJQ@mail.gmail.com
Whole thread Raw
In response to Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing  (Greg Stark <stark@mit.edu>)
Responses Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
List pgsql-hackers
On Thu, May 11, 2023 at 1:04 PM Greg Stark <stark@mit.edu> wrote:
> Fwiw while "wraparound" has pitfalls I think changing it for a new
> word isn't really helpful. Especially if it's a mostly meaningless
> word like "overload" or "exhaustion". It suddenly makes every existing
> doc hard to find and confusing to read.

Just to be clear, I am not proposing changing the name of
anti-wraparound autovacuum at all. What I'd like to do is use a term
like "XID exhaustion" to refer to the state that we internally refer
to as xidStopLimit. My motivation is simple: we've completely
terrified users by emphasizing wraparound, which is something that is
explicitly and prominently presented as a variety of data corruption.
The docs say this:

"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future — which means their output become invisible. In short,
catastrophic data loss."

> I say "exhaustion" or "overload" are meaningless because their meaning
> is entirely dependent on context. It's not like memory exhaustion or
> i/o overload where it's a finite resource and it's just the sheer
> amount in use that matters.

But transaction IDs are a finite resource, in the sense that you can
never have more than about 2.1 billion distinct unfrozen XIDs at any
one time. "Transaction ID exhaustion" is therefore a lot more
descriptive of the underlying problem. It's a lot better than
wraparound, which, as I've said, is inaccurate in two major ways:

1. Most cases involving xidStopLimit (or even single-user mode data
corruption) won't involve any kind of physical integer wraparound.

2. Most physical integer wraparound is harmless and perfectly routine.

But even this is fairly secondary to me. I don't actually think it's
that important that the name describe exactly what's going on here --
that's expecting rather a lot from a name. That's not really the goal.
The goal is to undo the damage of documentation that heavily implies
that data corruption is the eventual result of not doing enough
vacuuming, in its basic introductory remarks to freezing stuff.

Like Samay, my consistent experience (particularly back in my Heroku
days) has been that people imagine that data corruption would happen
when the system reached what we'd call xidStopLimit. Can you blame
them for thinking that? Almost any name for xidStopLimit that doesn't
have that historical baggage seems likely to be a vast improvement.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Kirk Wolak
Date:
Subject: Re: psql tests hangs
Next
From: Peter Geoghegan
Date:
Subject: Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing