Thread: Re: pg_upgrade can result in early wraparound on databases with hightransaction load

On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote:
> This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. About
twodays later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of our
oldest`datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound
errorreferred to a mysterious database of `OID 0`: 
>
> UPDATE ERROR:  database is not accepting commands to avoid wraparound data loss in database with OID 0
>
> We were able to recover after a few hours by greatly speeding up our vacuum on our largest table.
>
> In a followup investigation I uncovered the reason we hit the wraparound so early, and also the cause of the
mysteriousOID 0 message. When pg_upgrade executes, it calls pg_resetwal to set the next transaction ID. Within
pg_resetwalis the following code:
https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450
>
> This sets the controldata to have a fake database (OID 0) on the brink of transaction wraparound. Specifically, after
pg_upgradeis ran, wraparound will occur within around 140 million transactions (provided the autovacuum doesn't finish
first).I confirmed by analyzing our controldata before and after the upgrade that this was the cause of our early
wraparound.
>
> Given the size and heavy volume of our database, we tend to complete a vacuum in the time it takes around 250 million
transactionsto execute. With our tunings this tends to be rather safe and we stay well away from the wraparound point
undernormal circumstances. 

This does seem like an unfriendly behavior. Moving the thread over to
the -hackers list for further discussion...

--
Peter Geoghegan



On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:
> On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote:
> > This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine.
Abouttwo days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of
ouroldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound
errorreferred to a mysterious database of `OID 0`:
 
> >
> > UPDATE ERROR:  database is not accepting commands to avoid wraparound data loss in database with OID 0

That's bad.

> > We were able to recover after a few hours by greatly speeding up our vacuum on our largest table.

For what it's worth, a quicker workaround is to VACUUM FREEZE any database,
however small.  That forces a vac_truncate_clog(), which recomputes the wrap
point from pg_database.datfrozenxid values.  This demonstrates the workaround:

--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -248,7 +248,10 @@ case $testhost in
 esac
 
 pg_dumpall --no-sync -f "$temp_root"/dump2.sql || pg_dumpall2_status=$?
+pg_controldata "${PGDATA}"
+vacuumdb -F template1
 pg_ctl -m fast stop
+pg_controldata "${PGDATA}"
 
 if [ -n "$pg_dumpall2_status" ]; then
     echo "pg_dumpall of post-upgrade database cluster failed"

> > In a followup investigation I uncovered the reason we hit the wraparound so early, and also the cause of the
mysteriousOID 0 message. When pg_upgrade executes, it calls pg_resetwal to set the next transaction ID. Within
pg_resetwalis the following code:
https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450

pg_upgrade should set oldestXID to the same value as the source cluster or set
it like vac_truncate_clog() would set it.  Today's scheme is usually too
pessimistic, but it can be too optimistic if the source cluster was on the
bring of wrap.  Thanks for the report.



Hi,

On 2019-06-15 11:37:59 -0700, Noah Misch wrote:
> On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:
> > On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote:
> > > This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine.
Abouttwo days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of
ouroldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound
errorreferred to a mysterious database of `OID 0`:
 
> > >
> > > UPDATE ERROR:  database is not accepting commands to avoid wraparound data loss in database with OID 0
> 
> That's bad.

Yea. The code triggering it in pg_resetwal is bogus as far as I can
tell. That pg_upgrade triggers it makes this quite bad.

I just hit issues related to it when writing a wraparound handling
test. Peter remembered this issue (how?)...

Especially before 13 (inserts triggering autovacuum) it is quite common
to have tables that only ever get vacuumed due to anti-wraparound
vacuums. And it's common for larger databases to increase
autovacuum_freeze_max_age. Which makes it fairly likely for this to
guess an oldestXid value that's *newer* than an accurate one. Since
oldestXid is used in a few important-ish places (like triggering
vacuums, and in 14 also some snapshot related logic) I think that's bad.

The relevant code:

    if (set_xid != 0)
    {
        ControlFile.checkPointCopy.nextXid =
            FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
                                             set_xid);

        /*
         * For the moment, just set oldestXid to a value that will force
         * immediate autovacuum-for-wraparound.  It's not clear whether adding
         * user control of this is useful, so let's just do something that's
         * reasonably safe.  The magic constant here corresponds to the
         * maximum allowed value of autovacuum_freeze_max_age.
         */
        ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
        if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
            ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
        ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
    }

Originally from:

commit 25ec228ef760eb91c094cc3b6dea7257cc22ffb5
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2009-08-31 02:23:23 +0000

    Track the current XID wrap limit (or more accurately, the oldest unfrozen
    XID) in checkpoint records.  This eliminates the need to recompute the value
    from scratch during database startup, which is one of the two remaining
    reasons for the flatfile code to exist.  It should also simplify life for
    hot-standby operation.


I think we should remove the oldestXid guessing logic, and expose it as
an explicit option. I think it's important that pg_upgrade sets an
accurate value. Probably not worth caring about oldestXidDB though?

Greetings,

Andres Freund



Hi,

On 4/24/21 3:00 AM, Andres Freund wrote:
Hi,

On 2021-04-23 19:28:27 -0500, Justin Pryzby wrote:
This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
https://www.postgresql.org/message-id/flat/12601596dbbc4c01b86b4ac4d2bd4d48%40EX13D05UWC001.ant.amazon.com
https://www.postgresql.org/message-id/flat/a9f9376f1c3343a6bb319dce294e20ac%40EX13D05UWC001.ant.amazon.com
https://www.postgresql.org/message-id/flat/cc089cc3-fc43-9904-fdba-d830d8222145%40enterprisedb.com#3eec85391c6076a4913e96a86fece75e
Huh. Thanks for digging these up.


Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.
That seems the entirely the wrong approach to me, buying further into
the broken idea of inventing random wrong values for oldestXid.

We drive important things like the emergency xid limits off oldestXid. On
databases with tables that are older than ~147million xids (i.e. not even
affected by the default autovacuum_freeze_max_age) the current constant leads
to setting the oldestXid to a value *in the future*/wrapped around. Any
different different constant (or pg_upgrade parameter) will do that too in
other scenarios.

As far as I can tell there is precisely *no* correct behaviour here other than
exactly copying the oldestXid limit from the source database.

Please find attached a patch proposal doing so: it adds a new (- u) parameter to pg_resetwal that allows to specify the oldest unfrozen XID to set.
Then this new parameter is being used in pg_upgrade to copy the source Latest checkpoint's oldestXID.

Questions:

  • Should we keep the old behavior in case -x is being used without -u? (The proposed patch does not set an arbitrary oldestXID anymore in case -x is used.)
  • Also shouldn't we ensure that the xid provided with -x or -u is >= FirstNormalTransactionId (Currently the only check is that it is # 0)?

I'm adding this patch to the commitfest.

Bertrand

Attachment

Hi,

On 5/4/21 10:17 AM, Drouvot, Bertrand wrote:

Hi,

On 4/24/21 3:00 AM, Andres Freund wrote:
Hi,

On 2021-04-23 19:28:27 -0500, Justin Pryzby wrote:
This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
https://www.postgresql.org/message-id/flat/12601596dbbc4c01b86b4ac4d2bd4d48%40EX13D05UWC001.ant.amazon.com
https://www.postgresql.org/message-id/flat/a9f9376f1c3343a6bb319dce294e20ac%40EX13D05UWC001.ant.amazon.com
https://www.postgresql.org/message-id/flat/cc089cc3-fc43-9904-fdba-d830d8222145%40enterprisedb.com#3eec85391c6076a4913e96a86fece75e
Huh. Thanks for digging these up.


Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.
That seems the entirely the wrong approach to me, buying further into
the broken idea of inventing random wrong values for oldestXid.

We drive important things like the emergency xid limits off oldestXid. On
databases with tables that are older than ~147million xids (i.e. not even
affected by the default autovacuum_freeze_max_age) the current constant leads
to setting the oldestXid to a value *in the future*/wrapped around. Any
different different constant (or pg_upgrade parameter) will do that too in
other scenarios.

As far as I can tell there is precisely *no* correct behaviour here other than
exactly copying the oldestXid limit from the source database.

Please find attached a patch proposal doing so: it adds a new (- u) parameter to pg_resetwal that allows to specify the oldest unfrozen XID to set.
Then this new parameter is being used in pg_upgrade to copy the source Latest checkpoint's oldestXID.

Questions:

  • Should we keep the old behavior in case -x is being used without -u? (The proposed patch does not set an arbitrary oldestXID anymore in case -x is used.)
  • Also shouldn't we ensure that the xid provided with -x or -u is >= FirstNormalTransactionId (Currently the only check is that it is # 0)?


Copy/pasting Andres feedback (Thanks Andres for this feedback) on those questions from another thread [1].

> I was also wondering if:
>
> * We should keep the old behavior in case pg_resetwal -x is being used
> without -u?
 (The proposed patch does not set an arbitrary oldestXID
> anymore in 
case -x is used)

Andres: I don't think we should. I don't see anything in the old behaviour worth
maintaining.

> * We should ensure that the xid provided with -x or -u is
> >=
FirstNormalTransactionId (Currently the only check is that it is
> # 0)?

Andres: Applying TransactionIdIsNormal() seems like a good idea.

=> I am attaching a new version that makes use of TransactionIdIsNormal() checks.

Andres: I think it's important to verify that the xid provided with -x is within a reasonable range of the oldest xid.

=> What do you mean by "a reasonable range"?

Thanks

Bertrand

[1]: https://www.postgresql.org/message-id/20210517185646.pwe4klaufwmdhe2a%40alap3.anarazel.de




Attachment
Hi,

On 7/27/21 4:39 AM, Bruce Momjian wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 
>
>
>
> This patch has been applied back to 9.6 and will appear in the next
> minor release.

Thank you!

Bertrand





Bruce Momjian <bruce@momjian.us> writes:
> This patch has been applied back to 9.6 and will appear in the next
> minor release.

I have just discovered that this patch broke pg_upgrade's ability
to upgrade from 8.4:

$ pg_upgrade -b ~/version84/bin -d ...
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
The source cluster lacks some required control information:
  latest checkpoint oldestXID

Cannot continue without required control information, terminating
Failure, exiting

Sure enough, 8.4's pg_controldata doesn't print anything about
oldestXID, because that info wasn't there then.

Given the lack of field complaints, it's probably not worth trying
to do anything to restore that capability.  But we really ought to
update pg_upgrade's code and docs in pre-v15 branches to say that
the minimum supported source version is 9.0.

            regards, tom lane



On 2022-07-05 Tu 12:59, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> This patch has been applied back to 9.6 and will appear in the next
>> minor release.
> I have just discovered that this patch broke pg_upgrade's ability
> to upgrade from 8.4:
>
> $ pg_upgrade -b ~/version84/bin -d ...
> Performing Consistency Checks
> -----------------------------
> Checking cluster versions                                   ok
> The source cluster lacks some required control information:
>   latest checkpoint oldestXID
>
> Cannot continue without required control information, terminating
> Failure, exiting
>
> Sure enough, 8.4's pg_controldata doesn't print anything about
> oldestXID, because that info wasn't there then.
>
> Given the lack of field complaints, it's probably not worth trying
> to do anything to restore that capability.  But we really ought to
> update pg_upgrade's code and docs in pre-v15 branches to say that
> the minimum supported source version is 9.0.


So it's taken us a year to discover the issue :-( Perhaps if we're going
to say we support upgrades back to 9.0 we should have some testing to be
assured we don't break it without knowing like this. I'll see if I can
coax crake to do that - it already tests back to 9.2.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Andrew Dunstan <andrew@dunslane.net> writes:
> So it's taken us a year to discover the issue :-( Perhaps if we're going
> to say we support upgrades back to 9.0 we should have some testing to be
> assured we don't break it without knowing like this. I'll see if I can
> coax crake to do that - it already tests back to 9.2.

Hmm ... could you first look into why 09878cdd4 broke it?  I'd supposed
that that was just detecting situations we must already have dealt with
in order for the pg_upgrade test to work, but crake's not happy.

            regards, tom lane



On Tue, Jul 5, 2022 at 11:53 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> > Sure enough, 8.4's pg_controldata doesn't print anything about
> > oldestXID, because that info wasn't there then.
> >
> > Given the lack of field complaints, it's probably not worth trying
> > to do anything to restore that capability.  But we really ought to
> > update pg_upgrade's code and docs in pre-v15 branches to say that
> > the minimum supported source version is 9.0.
>
>
> So it's taken us a year to discover the issue :-(

I'm not surprised at all, given the history here. There were at least
a couple of bugs affecting how pg_upgrade carries forward information
about these cutoffs. See commits 74cf7d46 and a61daa14.

Actually, commit 74cf7d46 was where pg_resetxlog/pg_resetwal's -u
argument was first added, for use by pg_upgrade. That commit is only
about a year old, and was only backpatched to 9.6. Unfortunately the
previous approach to carrying forward oldestXID was an accident that
usually worked. So...yeah, things are bad here. At least we now have
the ability to detect any downstream problems that this might cause by
using pg_amcheck.


--
Peter Geoghegan



On Tue, Jul 5, 2022 at 12:41 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Actually, commit 74cf7d46 was where pg_resetxlog/pg_resetwal's -u
> argument was first added, for use by pg_upgrade. That commit is only
> about a year old, and was only backpatched to 9.6.

I just realized that this thread was where that work was first
discussed. That explains why it took a year to discover that we broke
8.4!

On further reflection I think that breaking pg_upgrade for 8.4 might
have been a good thing. The issue was fairly visible and obvious if
you actually ran into it, which is vastly preferable to what would
have happened before commit 74cf7d46.

-- 
Peter Geoghegan



On 2022-07-05 Tu 15:17, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> So it's taken us a year to discover the issue :-( Perhaps if we're going
>> to say we support upgrades back to 9.0 we should have some testing to be
>> assured we don't break it without knowing like this. I'll see if I can
>> coax crake to do that - it already tests back to 9.2.
> Hmm ... could you first look into why 09878cdd4 broke it?  I'd supposed
> that that was just detecting situations we must already have dealt with
> in order for the pg_upgrade test to work, but crake's not happy.


It's complaining about this:


andrew@emma:HEAD $ cat
./inst/REL9_6_STABLE-20220705T160820.039/incompatible_polymorphics.txt
In database: regression
  aggregate: public.first_el_agg_f8(double precision)

I can have TestUpgradeXVersion.pm search for and remove offending
functions, if that's the right fix.

I note too that drongo is failing similarly, but its pg_upgrade output
directory is missing, so 4fff78f009 seems possibly shy of a load w.r.t.
MSVC. I will investigate.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Andrew Dunstan <andrew@dunslane.net> writes:
> On 2022-07-05 Tu 15:17, Tom Lane wrote:
>> Hmm ... could you first look into why 09878cdd4 broke it?  I'd supposed
>> that that was just detecting situations we must already have dealt with
>> in order for the pg_upgrade test to work, but crake's not happy.

> It's complaining about this:

> andrew@emma:HEAD $ cat
> ./inst/REL9_6_STABLE-20220705T160820.039/incompatible_polymorphics.txt
> In database: regression
>   aggregate: public.first_el_agg_f8(double precision)

Thanks.

> I can have TestUpgradeXVersion.pm search for and remove offending
> functions, if that's the right fix.

I'm not sure.  It seems like the new check must be too strict,
because it was only meant to detect cases that would cause a subsequent
dump/reload failure, and evidently this did not.  I'll have to look
closer to figure out what to do.  Anyway, it's off topic for this
thread ...

            regards, tom lane