Thread: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619

prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619

From
Heikki Linnakangas
Date:
Hi,

After my commit c532d15ddd to split up copy.c, buildfarm animal "prion" 
failed in pg_upgrade 
(https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-11-23%2009%3A23%3A16):

> Upgrade Complete
> ----------------
> Optimizer statistics are not transferred by pg_upgrade so,
> once you start the new server, consider running:
>     /home/ec2-user/bf/root/HEAD/pgsql.build/tmp_install/home/ec2-user/bf/root/HEAD/inst/bin/vacuumdb --all
--analyze-in-stages
> 
> Running this script will delete the old cluster's data files:
>     ./delete_old_cluster.sh
> waiting for server to start.... done
> server started
> pg_dump: error: query failed: ERROR:  missing chunk number 0 for toast value 14334 in pg_toast_2619
> pg_dump: error: query was: SELECT
> a.attnum,
> a.attname,
> a.atttypmod,
> a.attstattarget,
> a.attstorage,
> t.typstorage,
> a.attnotnull,
> a.atthasdef,
> a.attisdropped,
> a.attlen,
> a.attalign,
> a.attislocal,
> pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,
> array_to_string(a.attoptions, ', ') AS attoptions,
> CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation,
> pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || ' ' ||
pg_catalog.quote_literal(option_value)FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E',
 
>     ') AS attfdwoptions,
> a.attidentity,
> CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE null END AS attmissingval,
> a.attgenerated
> FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON a.atttypid = t.oid
> WHERE a.attrelid = '35221'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2
> ORDER BY a.attnum
> pg_dumpall: error: pg_dump failed on database "regression", exiting
> waiting for server to shut down.... done
> server stopped
> pg_dumpall of post-upgrade database cluster failed
> make: *** [check] Error 1

I don't think this is related to commit c532d15ddd, as there is no COPY 
involved here. I have no clue what might might be going on here, though. 
Any ideas?

Googling around, I found one report that looks somewhat similar: 
https://www.postgresql.org/message-id/20190830142655.GA14011%40telsasoft.com. 
That was with CLUSTER/VACUUM FULL, while pg_upgrade performs ANALYZE.

- Heikki



Heikki Linnakangas <hlinnaka@iki.fi> writes:
> After my commit c532d15ddd to split up copy.c, buildfarm animal "prion"
> failed in pg_upgrade
> (https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-11-23%2009%3A23%3A16):

prion's continued to fail, rarely, in this same way:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-15%2004%3A08%3A06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-05-12%2018%3A43%3A14

The failures are remarkably identical, and they also look a lot like
field reports we've been seeing off and on for years.  I do not know
why it always seems to be pg_toast_2619 (i.e. pg_statistic) that's
affected, but the pattern is pretty undeniable by now.

What I do have that's new is that *I can reproduce it*, at long last.
For me, installing the attached patch and running pg_upgrade's
"make check" fails, pretty much every time, with symptoms identical
to prion's.

The patch consists of

(1) 100ms delay just before detoasting, when loading a pg_statistic
catcache entry that has toasted datums

(2) provisions to mark such catcache entries dead immediately
(effectively, CATCACHE_FORCE_RELEASE for only these tuples);
this is to force us through (1) as often as possible

(3) some quick-hack debugging aids added to HeapTupleSatisfiesToast,
plus convert the missing-chunk error to PANIC to get a stack
trace.

If it doesn't reproduce for you, try adjusting the delay.  100ms
was the first value I tried, though, so I think it's probably
not too sensitive.

The trace I'm getting shows pretty positively that autovacuum
has fired on pg_statistic, and removed the needed toast entries,
just before the failure.  So something is wrong with our logic
about when toast entries can be removed.

I do not have a lot of idea why, but I see something that is
probably a honkin' big clue:

2021-05-15 17:28:05.965 EDT [2469444] LOG:  HeapTupleSatisfiesToast: xmin 2 t_infomask 0x0b02

That is, the toast tuples in question are not just frozen, but
actually have xmin = FrozenTransactionId.

I do not think that is normal --- at least, it's not the state
immediately after initdb, and I can't make it happen with
"vacuum freeze pg_statistic".  A plausible theory is that pg_upgrade
caused this to happen (but how?) and then vacuuming of toast rows
goes off the rails because of it.

Anyway, I have no more time to poke at this right now, so I'm
posting the reproducer in case someone else wants to look at it.

            regards, tom lane

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index d3c57cd16a..5830df83c2 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -73,11 +73,14 @@
 #include "access/xlog.h"
 #include "storage/bufmgr.h"
 #include "storage/procarray.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/combocid.h"
 #include "utils/snapmgr.h"


+bool trace_toast_visibility = false;
+
 /*
  * SetHintBits()
  *
@@ -366,6 +369,13 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
     Assert(ItemPointerIsValid(&htup->t_self));
     Assert(htup->t_tableOid != InvalidOid);

+    if (trace_toast_visibility)
+        ereport(LOG,
+                errmsg("HeapTupleSatisfiesToast: xmin %u t_infomask 0x%04x",
+                       HeapTupleHeaderGetXmin(tuple),
+                       tuple->t_infomask),
+                debug_query_string ? 0 : errbacktrace());
+
     if (!HeapTupleHeaderXminCommitted(tuple))
     {
         if (HeapTupleHeaderXminInvalid(tuple))
@@ -420,6 +430,11 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
             return false;
     }

+    if (trace_toast_visibility)
+        elog(LOG, "HeapTupleSatisfiesToast: xmin %u t_infomask 0x%04x OK",
+             HeapTupleHeaderGetXmin(tuple),
+             tuple->t_infomask);
+
     /* otherwise assume the tuple is valid for TOAST. */
     return true;
 }
diff --git a/src/backend/access/heap/heaptoast.c b/src/backend/access/heap/heaptoast.c
index 55bbe1d584..03ea2dc80d 100644
--- a/src/backend/access/heap/heaptoast.c
+++ b/src/backend/access/heap/heaptoast.c
@@ -781,7 +781,7 @@ heap_fetch_toast_slice(Relation toastrel, Oid valueid, int32 attrsize,
      * Final checks that we successfully fetched the datum
      */
     if (expectedchunk != (endchunk + 1))
-        ereport(ERROR,
+        ereport(PANIC,
                 (errcode(ERRCODE_DATA_CORRUPTED),
                  errmsg_internal("missing chunk number %d for toast value %u in %s",
                                  expectedchunk, valueid,
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 4fbdc62d8c..a1f923e02c 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -23,6 +23,7 @@
 #include "access/xact.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
@@ -39,6 +40,7 @@
 #include "utils/resowner_private.h"
 #include "utils/syscache.h"

+extern bool trace_toast_visibility;

  /* #define CACHEDEBUG */    /* turns DEBUG elogs on */

@@ -1810,6 +1812,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 {
     CatCTup    *ct;
     HeapTuple    dtp;
+    bool makedead = false;
     MemoryContext oldcxt;

     /* negative entries have no tuple associated */
@@ -1827,7 +1830,16 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
          * something using a slightly stale catcache entry.
          */
         if (HeapTupleHasExternal(ntp))
+        {
+            if (cache->cc_reloid == StatisticRelationId)
+            {
+                pg_usleep(100000);
+                makedead = true;
+                trace_toast_visibility = true;
+            }
             dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+            trace_toast_visibility = false;
+        }
         else
             dtp = ntp;

@@ -1887,7 +1899,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
     ct->my_cache = cache;
     ct->c_list = NULL;
     ct->refcount = 0;            /* for the moment */
-    ct->dead = false;
+    ct->dead = makedead;
     ct->negative = negative;
     ct->hash_value = hashValue;


I wrote:
> What I do have that's new is that *I can reproduce it*, at long last.

I see what's happening, but I'm not quite sure where we should fix it.

> I do not have a lot of idea why, but I see something that is
> probably a honkin' big clue:
> 2021-05-15 17:28:05.965 EDT [2469444] LOG:  HeapTupleSatisfiesToast: xmin 2 t_infomask 0x0b02
> That is, the toast tuples in question are not just frozen, but
> actually have xmin = FrozenTransactionId.

Nah, that was a mistake: I was printing xmin using HeapTupleHeaderGetXmin
where I should have used HeapTupleHeaderGetRawXmin.

Anyway, the detailed sequence of events is that a background autoanalyze
replaces a pg_statistic entry, including outdating a toasted field, and
commits.  The foreground pg_dump session has already fetched that entry,
and in the race condition in question, it doesn't try to read the toast
rows till after the commit.  The reason it fails to find them is that
as part of the normal indexscan sequence for reading the toast table,
we execute heap_page_prune, and *it decides those rows are DEAD and
prunes them away* before we can fetch them.

The reason heap_page_prune thinks they're dead is that
GlobalVisTestIsRemovableXid told it so.

The reason GlobalVisTestIsRemovableXid thinks that is that
GlobalVisCatalogRels.maybe_needed is insane:

(gdb) p GlobalVisCatalogRels
$2 = {definitely_needed = {value = 16614}, maybe_needed = {
    value = 18446744071709568230}}

The reason GlobalVisCatalogRels.maybe_needed is insane is that
(per pg_controldata) our latest checkpoint has

Latest checkpoint's NextXID:          0:16614
Latest checkpoint's oldestXID:        2294983910

so when GetSnapshotData computes

        oldestfxid = FullXidRelativeTo(latest_completed, oldestxid);

it gets an insane value:

oldestfxid 18446744071709568230, latest_completed 16613, oldestxid 2294983910

And the reason oldestXID contains that is that pg_upgrade applied
pg_resetwal, which does this:

        /*
         * For the moment, just set oldestXid to a value that will force
         * immediate autovacuum-for-wraparound.  It's not clear whether adding
         * user control of this is useful, so let's just do something that's
         * reasonably safe.  The magic constant here corresponds to the
         * maximum allowed value of autovacuum_freeze_max_age.
         */
        ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
        if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
            ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;

So it seems like we should do some combination of these things:

1. Fix FullXidRelativeTo to be a little less trusting.  It'd
probably be sane to make it return FirstNormalTransactionId
when it'd otherwise produce a wrapped-around FullXid, but is
there any situation where we'd want it to throw an error instead?

2. Change pg_resetwal to not do the above.  It's not entirely
apparent to me what business it has trying to force
autovacuum-for-wraparound anyway, but if it does need to do that,
can we devise a less klugy method?

It also seems like some assertions in procarray.c would be a
good idea.  With the attached patch, we get through core
regression just fine, but the pg_upgrade test fails immediately
after the "Resetting WAL archives" step.

BTW, it looks like the failing code all came in with dc7420c2c
(snapshot scalability), so this fails to explain the similar-looking
field reports we've seen.  But it's clearly a live bug in v14.
I shall go add it to the open items.

            regards, tom lane

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 42a89fc5dc..ffc2edaf4f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2462,6 +2462,15 @@ GetSnapshotData(Snapshot snapshot)
                                    oldestfxid);
         /* accurate value known */
         GlobalVisTempRels.maybe_needed = GlobalVisTempRels.definitely_needed;
+
+        /* Do basic sanity check on these XIDs */
+        Assert(FullTransactionIdPrecedesOrEquals(GlobalVisSharedRels.maybe_needed,
+                                                 GlobalVisSharedRels.definitely_needed));
+        Assert(FullTransactionIdPrecedesOrEquals(GlobalVisCatalogRels.maybe_needed,
+                                                 GlobalVisCatalogRels.definitely_needed));
+        Assert(FullTransactionIdPrecedesOrEquals(GlobalVisDataRels.maybe_needed,
+                                                 GlobalVisDataRels.definitely_needed));
+        /* not much point in checking GlobalVisTempRels, given the above */
     }

     RecentXmin = xmin;
@@ -3996,6 +4005,8 @@ GlobalVisTestFor(Relation rel)

     Assert(FullTransactionIdIsValid(state->definitely_needed) &&
            FullTransactionIdIsValid(state->maybe_needed));
+    Assert(FullTransactionIdPrecedesOrEquals(state->maybe_needed,
+                                             state->definitely_needed));

     return state;
 }
@@ -4061,6 +4072,15 @@ GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons)
                                GlobalVisDataRels.definitely_needed);
     GlobalVisTempRels.definitely_needed = GlobalVisTempRels.maybe_needed;

+    /* Do basic sanity check on these XIDs */
+    Assert(FullTransactionIdPrecedesOrEquals(GlobalVisSharedRels.maybe_needed,
+                                             GlobalVisSharedRels.definitely_needed));
+    Assert(FullTransactionIdPrecedesOrEquals(GlobalVisCatalogRels.maybe_needed,
+                                             GlobalVisCatalogRels.definitely_needed));
+    Assert(FullTransactionIdPrecedesOrEquals(GlobalVisDataRels.maybe_needed,
+                                             GlobalVisDataRels.definitely_needed));
+    /* not much point in checking GlobalVisTempRels, given the above */
+
     ComputeXidHorizonsResultLastXmin = RecentXmin;
 }


On Sun, May 16, 2021 at 1:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> it gets an insane value:
>
> oldestfxid 18446744071709568230, latest_completed 16613, oldestxid 2294983910
>
> And the reason oldestXID contains that is that pg_upgrade applied
> pg_resetwal, which does this:
>
>         /*
>          * For the moment, just set oldestXid to a value that will force
>          * immediate autovacuum-for-wraparound.  It's not clear whether adding
>          * user control of this is useful, so let's just do something that's
>          * reasonably safe.  The magic constant here corresponds to the
>          * maximum allowed value of autovacuum_freeze_max_age.
>          */
>         ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
>         if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
>             ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;

This same pg_resetwal code has probably caused quite a few problems on
pg_upgrade'd databases:

https://postgr.es/m/20210423234256.hwopuftipdmp3okf@alap3.anarazel.de

-- 
Peter Geoghegan



Peter Geoghegan <pg@bowt.ie> writes:
> On Sun, May 16, 2021 at 1:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> And the reason oldestXID contains that is that pg_upgrade applied
>> pg_resetwal, which does this:
>> * For the moment, just set oldestXid to a value that will force
>> * immediate autovacuum-for-wraparound.

> This same pg_resetwal code has probably caused quite a few problems on
> pg_upgrade'd databases:
> https://postgr.es/m/20210423234256.hwopuftipdmp3okf@alap3.anarazel.de

Hm, yeah.  I'm not sure if transferring the value forward from the
old cluster is entirely safe, but if it is, that seems like a
promising route to a fix.  (We should still have more sanity checking
around the GlobalVis code, though.)

            regards, tom lane



Hi,

On 2021-05-16 16:23:02 -0400, Tom Lane wrote:
> And the reason oldestXID contains that is that pg_upgrade applied
> pg_resetwal, which does this:
>          
>         /*
>          * For the moment, just set oldestXid to a value that will force
>          * immediate autovacuum-for-wraparound.  It's not clear whether adding
>          * user control of this is useful, so let's just do something that's
>          * reasonably safe.  The magic constant here corresponds to the
>          * maximum allowed value of autovacuum_freeze_max_age.
>          */
>         ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
>         if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
>             ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;

Yea - this is causing quite a few problems... See
https://www.postgresql.org/message-id/20210423234256.hwopuftipdmp3okf%40alap3.anarazel.de


> So it seems like we should do some combination of these things:
> 
> 1. Fix FullXidRelativeTo to be a little less trusting.  It'd
> probably be sane to make it return FirstNormalTransactionId
> when it'd otherwise produce a wrapped-around FullXid, but is
> there any situation where we'd want it to throw an error instead?

I'm wondering whether we should *always* make it an error, and fix the
places where that causes problems.


> 2. Change pg_resetwal to not do the above.  It's not entirely
> apparent to me what business it has trying to force
> autovacuum-for-wraparound anyway, but if it does need to do that,
> can we devise a less klugy method?

Yes, see the above email. I think we really to transport accurate oldest
xid + epoch for pg_upgrade.


> It also seems like some assertions in procarray.c would be a
> good idea.  With the attached patch, we get through core
> regression just fine, but the pg_upgrade test fails immediately
> after the "Resetting WAL archives" step.

Agreed.

Greetings,

Andres Freund



Hi,

On 2021-05-16 18:21:21 -0400, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Sun, May 16, 2021 at 1:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> And the reason oldestXID contains that is that pg_upgrade applied
> >> pg_resetwal, which does this:
> >> * For the moment, just set oldestXid to a value that will force
> >> * immediate autovacuum-for-wraparound.
>
> > This same pg_resetwal code has probably caused quite a few problems on
> > pg_upgrade'd databases:
> > https://postgr.es/m/20210423234256.hwopuftipdmp3okf@alap3.anarazel.de
>
> Hm, yeah.  I'm not sure if transferring the value forward from the
> old cluster is entirely safe, but if it is, that seems like a
> promising route to a fix.  (We should still have more sanity checking
> around the GlobalVis code, though.)

Why would it not be safe? Or at least safer than what we're doing right
now?  It definitely isn't safe to use a newer value than what the old
cluster used - tables might have older tuples. And an older value than
the old cluster's means that we can either accidentally wrap around
without being protected or that a cluster might shut down to prevent a
wraparound.  And after the pg_resetwal we can't assign xids that are
older than set_xid - so it won't become inaccurate?

I think we should remove the heuristic thing from pg_resetwal entirely,
and error out if next-xid is set to something too far away from oldest
xid, unless oldexid is also specified.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> On 2021-05-16 18:21:21 -0400, Tom Lane wrote:
>> Hm, yeah.  I'm not sure if transferring the value forward from the
>> old cluster is entirely safe, but if it is, that seems like a
>> promising route to a fix.  (We should still have more sanity checking
>> around the GlobalVis code, though.)

> Why would it not be safe?

I'm just wondering about the catalog tuples set up by pg_upgrade
itself.  If they're all frozen then they probably don't matter to
this, but it might take some thought.

> I think we should remove the heuristic thing from pg_resetwal entirely,
> and error out if next-xid is set to something too far away from oldest
> xid, unless oldexid is also specified.

Seems plausible ...

            regards, tom lane



Hi,

On 2021-05-16 18:42:53 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Why would it not be safe?
> 
> I'm just wondering about the catalog tuples set up by pg_upgrade
> itself.  If they're all frozen then they probably don't matter to
> this, but it might take some thought.

There shouldn't be any catalog objects (vs tuples) set up by pg_upgrade
at the time of the resetwal, as far as I can see. copy_xact_xlog_xid(),
which includes the resetwal calls, is done before any new objects are
created/restored.

The only thing that happens before copy_xact_xlog_xid() is
prepare_new_cluster(), which analyzes/freezes the catalog of the new
cluster. Of course that does create new stats tuples for catalog tables,
but if the freezing of those doesn't work, we'd be in deep trouble
regardless of which concrete oldestXid value we choose - that happens
with xids as they are in a freshly initdb' cluster, which might be in
the future in the old cluster, or might have aborted. Their pg_xact will
be overwritten in copy_xact_xlog_xid().

Greetings,

Andres Freund



On Sun, May 16, 2021 at 04:23:02PM -0400, Tom Lane wrote:
> And the reason oldestXID contains that is that pg_upgrade applied
> pg_resetwal, which does this:
>          
>         /*
>          * For the moment, just set oldestXid to a value that will force
>          * immediate autovacuum-for-wraparound.  It's not clear whether adding
>          * user control of this is useful, so let's just do something that's
>          * reasonably safe.  The magic constant here corresponds to the
>          * maximum allowed value of autovacuum_freeze_max_age.
>          */
>         ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
>         if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
>             ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
> 
> So it seems like we should do some combination of these things:
> 
> 1. Fix FullXidRelativeTo to be a little less trusting.  It'd
> probably be sane to make it return FirstNormalTransactionId
> when it'd otherwise produce a wrapped-around FullXid, but is
> there any situation where we'd want it to throw an error instead?
> 
> 2. Change pg_resetwal to not do the above.  It's not entirely
> apparent to me what business it has trying to force
> autovacuum-for-wraparound anyway, but if it does need to do that,
> can we devise a less klugy method?

Sorry, I wish I could help with this pg_upgrade problem, but I have no
idea why pg_resetwal is doing that.  Pg_upgrade is supposed to be
turning off autovacuum, though I think we have had some cases where it
could not be fully turned off and consumption of many xids caused
autovacuum to run and break pg_upgrade.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Hi,

On 5/17/21 1:28 AM, Andres Freund wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



Hi,

On 2021-05-16 18:42:53 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Why would it not be safe?
I'm just wondering about the catalog tuples set up by pg_upgrade
itself.  If they're all frozen then they probably don't matter to
this, but it might take some thought.
There shouldn't be any catalog objects (vs tuples) set up by pg_upgrade
at the time of the resetwal, as far as I can see. copy_xact_xlog_xid(),
which includes the resetwal calls, is done before any new objects are
created/restored.

The only thing that happens before copy_xact_xlog_xid() is
prepare_new_cluster(), which analyzes/freezes the catalog of the new
cluster. Of course that does create new stats tuples for catalog tables,
but if the freezing of those doesn't work, we'd be in deep trouble
regardless of which concrete oldestXid value we choose - that happens
with xids as they are in a freshly initdb' cluster, which might be in
the future in the old cluster, or might have aborted. Their pg_xact will
be overwritten in copy_xact_xlog_xid().


FWIW a patch proposal to copy the oldest unfrozen XID during pg_upgrade (it adds a new (- u) parameter to pg_resetwal) has been submitted a couple of weeks ago, see: https://commitfest.postgresql.org/33/3105/

I was also wondering if:

  • We should keep the old behavior in case pg_resetwal -x is being used without -u?
 (The proposed patch does not set an arbitrary oldestXID anymore in 
case -x is used)
  • We should ensure that the xid provided with -x or -u is >=
FirstNormalTransactionId (Currently the only check is that it is # 0)?

Thanks

Bertrand

Hi,

On 2021-05-17 20:14:40 +0200, Drouvot, Bertrand wrote:
> FWIW a patch proposal to copy the oldest unfrozen XID during pg_upgrade (it
> adds a new (- u) parameter to pg_resetwal) has been submitted a couple of
> weeks ago, see: https://commitfest.postgresql.org/33/3105/

I'll try to look at it soon.


> I was also wondering if:
> 
>  * We should keep the old behavior in case pg_resetwal -x is being used
>    without -u?
 (The proposed patch does not set an arbitrary oldestXID
>    anymore in 
case -x is used)

I don't think we should. I don't see anything in the old behaviour worth
maintaining.


>  * We should ensure that the xid provided with -x or -u is
>     >=
FirstNormalTransactionId (Currently the only check is that it is
>    # 0)?

Applying TransactionIdIsNormal() seems like a good idea. I think it's
important to verify that the xid provided with -x is within a reasonable
range of the oldest xid.

Greetings,

Andres Freund



Hi,

On 5/17/21 8:56 PM, Andres Freund wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 
>
>
>
> Hi,
>
> On 2021-05-17 20:14:40 +0200, Drouvot, Bertrand wrote:
>> FWIW a patch proposal to copy the oldest unfrozen XID during pg_upgrade (it
>> adds a new (- u) parameter to pg_resetwal) has been submitted a couple of
>> weeks ago, see: https://commitfest.postgresql.org/33/3105/
> I'll try to look at it soon.

Thanks!

>
>> I was also wondering if:
>>
>>   * We should keep the old behavior in case pg_resetwal -x is being used
>>     without -u?
 (The proposed patch does not set an arbitrary oldestXID
>>     anymore in 
case -x is used)
> I don't think we should. I don't see anything in the old behaviour worth
> maintaining.
>
>
>>   * We should ensure that the xid provided with -x or -u is
>>      >=
FirstNormalTransactionId (Currently the only check is that it is
>>     # 0)?
> Applying TransactionIdIsNormal() seems like a good idea. I think it's
> important to verify that the xid provided with -x is within a reasonable
> range of the oldest xid.

I'll copy/paste this feedback (+ an updated patch to make use of 
TransactionIdIsNormal() checks) to the thread [1] that is linked to the 
commitfest entry.

Thanks

Bertrand

[1]: 
https://www.postgresql.org/message-id/fe006d56-85f1-5f1e-98e7-05b53dff4f51@amazon.com




On Tue, May 18, 2021 at 01:04:18PM +0200, Drouvot, Bertrand wrote:
> On 5/17/21 8:56 PM, Andres Freund wrote:
>> On 2021-05-17 20:14:40 +0200, Drouvot, Bertrand wrote:
>>> I was also wondering if:
>>>
>>>   * We should keep the old behavior in case pg_resetwal -x is being used
>>>     without -u?
 (The proposed patch does not set an arbitrary oldestXID
>>>     anymore in 
case -x is used)
>> I don't think we should. I don't see anything in the old behaviour worth
>> maintaining.

So, pg_resetwal logic with the oldest XID assignment is causing some
problem here.  This open item is opened for some time now and it is
idle for a couple of weeks.  It looks that we have some solution
drafted, to be able to move forward, with the following things (no
patches yet):
- More robustness safety checks in procarray.c.
- A rework of oldestXid in pg_resetwal.

Is there somebody working on that?
--
Michael

Attachment
On Sun, May 16, 2021 at 04:23:02PM -0400, Tom Lane wrote:
> 1. Fix FullXidRelativeTo to be a little less trusting.  It'd
> probably be sane to make it return FirstNormalTransactionId
> when it'd otherwise produce a wrapped-around FullXid, but is
> there any situation where we'd want it to throw an error instead?
> 
> 2. Change pg_resetwal to not do the above.  It's not entirely
> apparent to me what business it has trying to force
> autovacuum-for-wraparound anyway, but if it does need to do that,
> can we devise a less klugy method?
> 
> It also seems like some assertions in procarray.c would be a
> good idea.  With the attached patch, we get through core
> regression just fine, but the pg_upgrade test fails immediately
> after the "Resetting WAL archives" step.

#2 is done as of 74cf7d46a.

Is there a plan to include Tom's procarray assertions ?

-- 
Justin



On Sun, Aug 15, 2021 at 09:44:55AM -0500, Justin Pryzby wrote:
> On Sun, May 16, 2021 at 04:23:02PM -0400, Tom Lane wrote:
> > 1. Fix FullXidRelativeTo to be a little less trusting.  It'd
> > probably be sane to make it return FirstNormalTransactionId
> > when it'd otherwise produce a wrapped-around FullXid, but is
> > there any situation where we'd want it to throw an error instead?
> > 
> > 2. Change pg_resetwal to not do the above.  It's not entirely
> > apparent to me what business it has trying to force
> > autovacuum-for-wraparound anyway, but if it does need to do that,
> > can we devise a less klugy method?
> > 
> > It also seems like some assertions in procarray.c would be a
> > good idea.  With the attached patch, we get through core
> > regression just fine, but the pg_upgrade test fails immediately
> > after the "Resetting WAL archives" step.
> 
> #2 is done as of 74cf7d46a.
> 
> Is there a plan to include Tom's procarray assertions ?

I'm confused about the state of this patch/thread.

make check causes autovacuum crashes (but then the regression tests succeed
anyway).

I notice now that Tom was referring to failures in pg_upgrade, so maybe didn't
notice this part:

$ grep -c BACKTRACE src/test/regress/log/postmaster.log
10

2021-10-17 16:30:42.623 CDT autovacuum worker[13490] BACKTRACE:  
        postgres: autovacuum worker regression(errbacktrace+0x4e) [0x5650e5ea08ee]
        postgres: autovacuum worker regression(HeapTupleSatisfiesVisibility+0xd93) [0x5650e5a85283]
        postgres: autovacuum worker regression(heap_hot_search_buffer+0x2a5) [0x5650e5a7ec05]
        postgres: autovacuum worker regression(+0x1b573c) [0x5650e5a8073c]
        postgres: autovacuum worker regression(index_fetch_heap+0x5d) [0x5650e5a9345d]
        postgres: autovacuum worker regression(index_getnext_slot+0x5b) [0x5650e5a934fb]
        postgres: autovacuum worker regression(systable_getnext_ordered+0x26) [0x5650e5a92716]
        postgres: autovacuum worker regression(heap_fetch_toast_slice+0x33d) [0x5650e5a866ed]
        postgres: autovacuum worker regression(+0x162f61) [0x5650e5a2df61]
        postgres: autovacuum worker regression(toast_flatten_tuple+0xef) [0x5650e5a85dbf]
        postgres: autovacuum worker regression(+0x5b770d) [0x5650e5e8270d]
        postgres: autovacuum worker regression(+0x5b7d85) [0x5650e5e82d85]
        postgres: autovacuum worker regression(SearchCatCache3+0x1a9) [0x5650e5e83dc9]
        postgres: autovacuum worker regression(+0x29e4f5) [0x5650e5b694f5]
        postgres: autovacuum worker regression(+0x2a012e) [0x5650e5b6b12e]
        postgres: autovacuum worker regression(analyze_rel+0x1d1) [0x5650e5b6c851]
        postgres: autovacuum worker regression(vacuum+0x5c0) [0x5650e5bd4480]
        postgres: autovacuum worker regression(+0x40ce91) [0x5650e5cd7e91]
        postgres: autovacuum worker regression(+0x40df16) [0x5650e5cd8f16]
        postgres: autovacuum worker regression(AutoVacuumUpdateDelay+0) [0x5650e5cd9020]
        postgres: autovacuum worker regression(+0x41cccb) [0x5650e5ce7ccb]
        /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f1dd1d26890]
        /lib/x86_64-linux-gnu/libc.so.6(__select+0x17) [0x7f1dd128fff7]
        postgres: autovacuum worker regression(+0x41d11e) [0x5650e5ce811e]
        postgres: autovacuum worker regression(PostmasterMain+0xd1c) [0x5650e5ce9c1c]
        postgres: autovacuum worker regression(main+0x220) [0x5650e5a1f4f0]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f1dd119ab97]
        postgres: autovacuum worker regression(_start+0x2a) [0x5650e5a1f83a]

-- 
Justin



On Sun, Oct 17, 2021 at 04:43:15PM -0500, Justin Pryzby wrote:
> On Sun, Aug 15, 2021 at 09:44:55AM -0500, Justin Pryzby wrote:
> > On Sun, May 16, 2021 at 04:23:02PM -0400, Tom Lane wrote:
> > > 1. Fix FullXidRelativeTo to be a little less trusting.  It'd
> > > probably be sane to make it return FirstNormalTransactionId
> > > when it'd otherwise produce a wrapped-around FullXid, but is
> > > there any situation where we'd want it to throw an error instead?
> > > 
> > > 2. Change pg_resetwal to not do the above.  It's not entirely
> > > apparent to me what business it has trying to force
> > > autovacuum-for-wraparound anyway, but if it does need to do that,
> > > can we devise a less klugy method?
> > > 
> > > It also seems like some assertions in procarray.c would be a
> > > good idea.  With the attached patch, we get through core
> > > regression just fine, but the pg_upgrade test fails immediately
> > > after the "Resetting WAL archives" step.
> > 
> > #2 is done as of 74cf7d46a.
> > 
> > Is there a plan to include Tom's procarray assertions ?
> 
> I'm confused about the state of this patch/thread.
> 
> make check causes autovacuum crashes (but then the regression tests succeed
> anyway).

Sorry, I was confused here.  autovacuum is not crashing as I said; the
BACKTRACE lines from the LOG added by Tom's debugging patch:

+       if (trace_toast_visibility)
+               ereport(LOG,
+                               errmsg("HeapTupleSatisfiesToast: xmin %u t_infomask 0x%04x",
+                                          HeapTupleHeaderGetXmin(tuple),
+                                          tuple->t_infomask),
+                               debug_query_string ? 0 : errbacktrace());


2021-10-17 22:56:57.066 CDT autovacuum worker[19601] LOG:  HeapTupleSatisfiesToast: xmin 2 t_infomask 0x0b02
2021-10-17 22:56:57.066 CDT autovacuum worker[19601] BACKTRACE:  
...

I see that the pg_statistic problem can still occur on v14.  I still don't have
a recipe to reproduce it, though, other than running VACUUM FULL in a loop.
Can I provide anything useful to debug it? xmin, infomask, core, and
log_autovacuum_min_duration=0 ??

-- 
Justin