Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound |
Date | |
Msg-id | CAH2-Wzm2fpPQ_=pXpRvkNiuTYBGTAUfxRNW40kLitxj9T3Ny7w@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
|
List | pgsql-hackers |
On Sat, Apr 29, 2023 at 7:30 PM John Naylor <john.naylor@enterprisedb.com> wrote: > How about > > -HINT: To avoid a database shutdown, [...] > +HINT: To prevent XID exhaustion, [...] > > ...and "MXID", respectively? We could explain in the docs that vacuum and read-only queries still work "when XIDs havebeen exhausted", etc. I think that that particular wording works in this example -- we *are* avoiding XID exhaustion. But it still doesn't really address my concern -- at least not on its own. I think that we need a term for xidStopLimit mode (and perhaps multiStopLimit) itself. This is a discrete state/mode that is associated with a specific mechanism. I'd like to emphasize the purpose of xidStopLimit (over when xidStopLimit happens) in choosing this user-facing name. As you know, the point of xidStopLimit mode is to give autovacuum the chance to catch up with managing the XID space through freezing: the available supply of XIDs doesn't meet present demand, and hasn't for some time, so it finally came to this. Even if we had true 64-bit XIDs we'd probably still need something similar -- there would still have to be *some* point that allowing the "freezing deficit" to continue to grow just wasn't tenable. If a person consistently spends more than they take in, their "initial bankroll" isn't necessarily relevant. If our ~2.1 billion XID "bankroll" wasn't enough to avoid xidStopLimit, why would we expect 8 billion or 20 billion XIDs to have been enough? I'm thinking of a user-facing name for xidStopLimit along the lines of "emergency XID allocation restoration mode" (admittedly that's quite a mouthful). Something that carries the implication of "imbalance". The system was configured in a way that turned out to be unsustainable. The system was therefore forced to "restore sustainability" using the only tool that remained. This is closely related to the failsafe. As bad as xidStopLimit is, it won't always be the end of the world -- much depends on individual application requirements. > (I should probably also add in the commit message that the "shutdown" in the message was carried over to MXIDs when theyarrived also in 2005). > > > Separately, there is a need to update a couple of other places to use > > this new terminology. The documentation for vacuum_sailsafe_age and > > vacuum_multixact_failsafe_age refer to "system-wide transaction ID > > wraparound failure", which seems less than ideal, even without your > > patch. > > Right, I'll have a look. As you know, there is a more general problem with the use of the term "wraparound" in the docs, and in the system itself (in places like pg_stat_activity). Even the very basic terminology in this area is needlessly scary. Terms like "VACUUM (to prevent wraparound)" are uncomfortably close to "a race against time to avoid data corruption". The system isn't ever supposed to corrupt data, even if misconfigured (unless the misconfiguration is very low-level, such as "fsync=off"). Users should be able to take that much for granted. I don't expect either of us to address that problem in the short term -- the term "wraparound" is too baked-in for it to be okay to just remove it overnight. But, it could still make sense for your patch (or my own) to fully own the fact that "wraparound" is actually a misnomer. At least when used in contexts like "to prevent wraparound" (xidStopLimit actually "prevents wraparound", though we shouldn't say anything about it in a place of prominence). Let's reassure users that they should continue to take "we won't corrupt your data for no good reason" for granted. > I think the docs would do well to have ordered steps for recovering from both XID and MXID exhaustion. I had planned to address this with my ongoing work on the "Routine Vacuuming" docs, but I think that you're right about the necessity of addressing it as part of this patch. These extra steps will be required whenever the problem is a leaked prepared transaction, or something along those lines. That is increasingly likely to turn out to be the underlying cause of entering xidStopLimit, given the work we've done on VACUUM over the years. I still think that "imbalance" is the right way to frame discussion of xidStopLimit. After all, autovacuum/VACUUM will still spin its wheels in a futile effort to "restore balance". So it's kinda still about restoring imbalance IMV. -- Peter Geoghegan
pgsql-hackers by date: