Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date
Msg-id CAH2-WzmMTvhPmRDg_xdv6VpAumamOU5WMysA2Zf0BxB0y+pZJg@mail.gmail.com
Whole thread Raw
In response to Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On Thu, May 16, 2024 at 4:29 PM Andres Freund <andres@anarazel.de> wrote:
> On 2024-05-16 16:13:35 -0400, Peter Geoghegan wrote:
> > On Thu, May 16, 2024 at 3:39 PM Andres Freund <andres@anarazel.de> wrote:
> > > Melanies reproducer works because there are catalog accesses that can trigger
> > > a recomputation of fuzzy horizon. For testing the "easy" window for that is
> > > the vac_open_indexes() < 16, because it happens after determining the horizon,
> > > but before actually vacuuming.
> >
> > What about the call to GetOldestNonRemovableTransactionId() that takes
> > place in _bt_pendingfsm_finalize()?
>
> Ah, good catch! That'd do it.

That was definitely what happened in the problem cases I saw -- plenty
of B-Tree page deletions. Plus the heap page that lazy_scan_prune
actually got stuck on happened to be the first one after the first
round of bulk deletes of index tuples -- *exactly* the first heap page
scanned. Not just on one occasion -- there were several affected
instances that I had access to at various points, that all looked like
this (same workload for all of them, though). These were 14 and 15
instances (no 16 instances, likely just because 16 wasn't even really
available at the time).

It seems just about impossible that these details were all coincidental.

--
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Next
From: Andres Freund
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae