Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ... - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...
Date
Msg-id 20210628232656.GD21248@telsasoft.com
Whole thread Raw
In response to Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, Jun 28, 2021 at 01:42:25PM -0700, Peter Geoghegan wrote:
> On Sun, Jun 27, 2021 at 11:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > That said, the relevant table is the active "alarms" table, and it would've
> > > gotten plenty of DML with no issue for months running v13.
> >
> > It might not have been visibly broken without assertions enabled,
> > though. I sprinkled nbtdedup.c with these _bt_posting_valid()
> > assertions just because it was easy. The assertions were bound to
> > catch some problem sooner or later, and had acceptable overhead.
> 
> Obviously nothing stops you from running amcheck on the original
> database that you're running in production. You won't need to have
> enabled assertions to catch the same problem that way. This seems like
> the best way to isolate the problem. I strongly suspect that it's the
> LVM issue for my own reasons: nothing changed during the Postgres 14
> cycle that seems truly related.

Sorry, but I didn't save the pre-upgrade cluster (just pg_dump).

For now, I moved the table out of the way and re-created it.
I could send you the whole relnode if you wanted to look more..
It seemed like almost any insert on the table caused it to crash.

BTW, on a copy of the v14 cluster, both vacuum and reindex also resolved the
issue (at least enough to avoid the crash).

-- 
Justin



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Synchronous commit behavior during network outage
Next
From: Thomas Munro
Date:
Subject: Re: A qsort template