Thread: BUG #6291: Xid epoch is not updated properly
The following bug has been logged online: Bug reference: 6291 Logged by: Daniel Farina Email address: daniel@heroku.com PostgreSQL version: 9.0.5 Operating system: Ubuntu 10.04 Description: Xid epoch is not updated properly Details: We have on hand a database that makes heavy use of the txid_snapshot family of functions, and recently it just passed its 4^32 transaction mark. Unfortunately, upon wraparound the xid epoch appears to not have been incremented, remaining at 0. However, pg_controldata does properly report a > 4^32 number, and so far it appears the database otherwise functions normally. Here's a snippet: Latest checkpoint's NextXID: 0/2131670 Latest checkpoint's NextOID: 1416740 Latest checkpoint's NextMultiXactId: 1119 Latest checkpoint's NextMultiOffset: 3115 Latest checkpoint's oldestXID: 4131117606 Latest checkpoint's oldestXID's DB: 16385 Latest checkpoint's oldestActiveXID: 0 The result is the following documentation at "http://www.postgresql.org/docs/9.0/static/functions-info.html" is dangerously misleading: "The internal transaction ID type (xid) is 32 bits wide and wraps around every 4 billion transactions. However, these functions export a 64-bit format that is extended with an "epoch" counter so it will not wrap around during the life of an installation. The data type used by these functions, txid_snapshot, stores information about transaction ID visibility at a particular moment in time. Its components are described in Table 9-53."
"Daniel Farina" <daniel@heroku.com> writes: > We have on hand a database that makes heavy use of the txid_snapshot family > of functions, and recently it just passed its 4^32 transaction mark. > Unfortunately, upon wraparound the xid epoch appears to not have been > incremented, remaining at 0. I failed to reproduce this here, and a look at the code responsible for xid epoch maintenance reveals no obvious way that it could have been bypassed. So there's some fairly critical piece of context that you're not telling us ... regards, tom lane
On Sun, Nov 13, 2011 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Daniel Farina" <daniel@heroku.com> writes: >> We have on hand a database that makes heavy use of the txid_snapshot fam= ily >> of functions, and recently it just passed its 4^32 transaction mark. >> Unfortunately, upon wraparound the xid epoch appears to not have been >> incremented, remaining at 0. > > I failed to reproduce this here, and a look at the code responsible for > xid epoch maintenance reveals no obvious way that it could have been > bypassed. =A0So there's some fairly critical piece of context that you're > not telling us ... Hmm, the database has nothing particularly special about it; I also reviewed the epoch code and don't see any simple oversight. On the other hand, I should have the WAL that plays past the epoch wrap, so I can instrument some telltale bit of code; if you have any special suggestion about the diagnostics I'd like to hear them. Also, do you see anything strange about the pg_controldata as-is? I'm looking at in particular the greater-than 2**32 oldestXID seems in line with expectation, yet calling txid_current et-al exposes a number in the hundreds of thousands, as reflected in nextXID. --=20 fdr