Re: pg_upgrade can result in early wraparound on databases with high transaction load - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: pg_upgrade can result in early wraparound on databases with high transaction load |
Date | |
Msg-id | 20210423234256.hwopuftipdmp3okf@alap3.anarazel.de Whole thread Raw |
In response to | Re: pg_upgrade can result in early wraparound on databases with hightransaction load (Noah Misch <noah@leadboat.com>) |
List | pgsql-hackers |
Hi, On 2019-06-15 11:37:59 -0700, Noah Misch wrote: > On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote: > > On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote: > > > This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. Abouttwo days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of ouroldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound errorreferred to a mysterious database of `OID 0`: > > > > > > UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0 > > That's bad. Yea. The code triggering it in pg_resetwal is bogus as far as I can tell. That pg_upgrade triggers it makes this quite bad. I just hit issues related to it when writing a wraparound handling test. Peter remembered this issue (how?)... Especially before 13 (inserts triggering autovacuum) it is quite common to have tables that only ever get vacuumed due to anti-wraparound vacuums. And it's common for larger databases to increase autovacuum_freeze_max_age. Which makes it fairly likely for this to guess an oldestXid value that's *newer* than an accurate one. Since oldestXid is used in a few important-ish places (like triggering vacuums, and in 14 also some snapshot related logic) I think that's bad. The relevant code: if (set_xid != 0) { ControlFile.checkPointCopy.nextXid = FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid), set_xid); /* * For the moment, just set oldestXid to a value that will force * immediate autovacuum-for-wraparound. It's not clear whether adding * user control of this is useful, so let's just do something that's * reasonably safe. The magic constant here corresponds to the * maximum allowed value of autovacuum_freeze_max_age. */ ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000; if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId) ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId; ControlFile.checkPointCopy.oldestXidDB = InvalidOid; } Originally from: commit 25ec228ef760eb91c094cc3b6dea7257cc22ffb5 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: 2009-08-31 02:23:23 +0000 Track the current XID wrap limit (or more accurately, the oldest unfrozen XID) in checkpoint records. This eliminates the need to recompute the value from scratch during database startup, which is one of the two remaining reasons for the flatfile code to exist. It should also simplify life for hot-standby operation. I think we should remove the oldestXid guessing logic, and expose it as an explicit option. I think it's important that pg_upgrade sets an accurate value. Probably not worth caring about oldestXidDB though? Greetings, Andres Freund
pgsql-hackers by date: