Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619 - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619
Date
Msg-id 20211018042128.GB4679@telsasoft.com
Whole thread Raw
In response to Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Sun, Oct 17, 2021 at 04:43:15PM -0500, Justin Pryzby wrote:
> On Sun, Aug 15, 2021 at 09:44:55AM -0500, Justin Pryzby wrote:
> > On Sun, May 16, 2021 at 04:23:02PM -0400, Tom Lane wrote:
> > > 1. Fix FullXidRelativeTo to be a little less trusting.  It'd
> > > probably be sane to make it return FirstNormalTransactionId
> > > when it'd otherwise produce a wrapped-around FullXid, but is
> > > there any situation where we'd want it to throw an error instead?
> > > 
> > > 2. Change pg_resetwal to not do the above.  It's not entirely
> > > apparent to me what business it has trying to force
> > > autovacuum-for-wraparound anyway, but if it does need to do that,
> > > can we devise a less klugy method?
> > > 
> > > It also seems like some assertions in procarray.c would be a
> > > good idea.  With the attached patch, we get through core
> > > regression just fine, but the pg_upgrade test fails immediately
> > > after the "Resetting WAL archives" step.
> > 
> > #2 is done as of 74cf7d46a.
> > 
> > Is there a plan to include Tom's procarray assertions ?
> 
> I'm confused about the state of this patch/thread.
> 
> make check causes autovacuum crashes (but then the regression tests succeed
> anyway).

Sorry, I was confused here.  autovacuum is not crashing as I said; the
BACKTRACE lines from the LOG added by Tom's debugging patch:

+       if (trace_toast_visibility)
+               ereport(LOG,
+                               errmsg("HeapTupleSatisfiesToast: xmin %u t_infomask 0x%04x",
+                                          HeapTupleHeaderGetXmin(tuple),
+                                          tuple->t_infomask),
+                               debug_query_string ? 0 : errbacktrace());


2021-10-17 22:56:57.066 CDT autovacuum worker[19601] LOG:  HeapTupleSatisfiesToast: xmin 2 t_infomask 0x0b02
2021-10-17 22:56:57.066 CDT autovacuum worker[19601] BACKTRACE:  
...

I see that the pg_statistic problem can still occur on v14.  I still don't have
a recipe to reproduce it, though, other than running VACUUM FULL in a loop.
Can I provide anything useful to debug it? xmin, infomask, core, and
log_autovacuum_min_duration=0 ??

-- 
Justin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber
Next
From: Amit Kapila
Date:
Subject: Re: pgsql: Document XLOG_INCLUDE_XID a little better