Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Date
Msg-id CAFBsxsEJCi4nuAy_HFVkhLWGOqYbdMuDEOiQ274XCuDWKxLzNQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
List pgsql-hackers
Thanks for picking up this badly-needed topic again! I was irresponsible last year and let it fall off my radar, but I'm looking at the patches, as well as revisiting discussions from the last four (!?) years that didn't lead to action.

0001:

+    In this condition the system can still execute read-only transactions.
+    The active transactions will continue to execute and will be able to
+    commit.

This is ambiguous. I'd first say that any transactions already started can continue, and then say that only new read-only transactions can be started.

0004:

-HINT:  Stop the postmaster and vacuum that database in single-user mode.
+HINT:  VACUUM or VACUUM FREEZE that database.

VACUUM FREEZE is worse and should not be mentioned, since it does unnecessary work. Emergency vacuum is not school -- you don't get extra credit for doing unnecessary work.

Also, we may consider adding a boxed NOTE warning specifically against single-user mode, especially if this recommendation will change in at least some minor releases so people may not hear about it. See also [1].

- * If we're past xidStopLimit, refuse to execute transactions, unless
- * we are running in single-user mode (which gives an escape hatch
- * to the DBA who somehow got past the earlier defenses).
+ * If we're past xidStopLimit, refuse to allocate new XIDs.

This patch doesn't completely get rid of the need for single-user mode, so it should keep all information about it. If a DBA wanted to e.g. drop or truncate a table to save vacuum time, it is still possible to do it in single-user mode, so the escape hatch is still useful.

In swapping this topic back in my head, I also saw [2] where Robert suggested

"that old prepared transactions and stale replication
slots should be emphasized more prominently.  Maybe something like:

HINT:  Commit or roll back old prepared transactions, drop stale
replication slots, or kill long-running sessions.
Ensure that autovacuum is progressing, or run a manual database-wide VACUUM."

That sounds like a good direction to me. There is more we could do here to make the message more specific [3][4][5], but the patches here are in the right direction.

Note for possible backpatching: It seems straightforward to go back to PG14, which has the failsafe, but we should have better testing in place first. There is a patch in this CF to make it easier to get close to wraparound, so I'll look at what it does as well.

[1] https://www.postgresql.org/message-id/CA%2BTgmoadjx%2Br8_gGbbnNifL6vEyjZntiQRPzyixrUihvtZ5jdQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA+Tgmob1QCMJrHwRBK8HZtGsr+6cJANRQw2mEgJ9e=D+z7cOsw@mail.gmail.com
[3] https://www.postgresql.org/message-id/20190504023015.5mgpbl27tld4irw5%40alap3.anarazel.de
[4] https://www.postgresql.org/message-id/20220204013539.qdegpqzvayq3d4y2%40alap3.anarazel.de
[5] https://www.postgresql.org/message-id/20220220045757.GA3733812%40rfd.leadboat.com

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Allow logical replication to copy tables in binary format
Next
From: Alvaro Herrera
Date:
Subject: Re: Initial Schema Sync for Logical Replication