Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs

From Robert Haas
Subject Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date
Msg-id CA+TgmoYSM234TDJCyjAHch9igHP2tahXXENc8hBT+BHwcMkT8w@mail.gmail.com
Whole thread Raw
In response to Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-bugs
On Fri, Mar 29, 2024 at 1:17 PM Peter Geoghegan <pg@bowt.ie> wrote:
> FWIW I never thought that the order that we called
> vacuum_get_cutoffs() relative to when we call GlobalVisTestFor() was
> directly significant (though I did think that about the order that we
> attain VACUUM's rel_pages and the vacuum_get_cutoffs() call). I can't
> have thought that, because clearly GlobalVisTestFor() just returns a
> pointer, and so cannot directly affect backend local state.

Hmm, OK.

> It was clear that this is an important issue, from an early stage.
> Pre-release 14 had 2 or 3 bugs that all had the same symptom:
> lazy_scan_prune would loop forever. This was true even though each of
> the bugs had fairly different underlying causes (all tied to
> dc7420c2c). I figured that there might well be more bugs like that in
> the future.

Looks like you were right.

> I have every reason to believe that the remaining problems in this
> area are extremely rare. I wonder if it would make sense to focus on
> making the infinite loop behavior in lazy_scan_prune just throw an
> error.
>
> I now fear that that'll be harder than one might think. At the time
> that I added the looping behavior (in commit 8523492d), I believed
> that the only "legitimate" reason that it could ever be needed was the
> same reason why we needed the old tupgone behavior (to deal with
> concurrently-inserted tuples from transactions that abort in flight).
> But now I worry that it's actually protective, in some way that isn't
> generally understood. And so it might be that converting the retry
> into a hard error (e.g., erroring-out after MaxHeapTuplesPerPage
> retries) will create new problems.

It also sounds like it would boil down to "ERROR: our code sucks", so
count me as not a fan of that approach. As much as I don't like the
idea of significant changes to the back-branches, I think I like that
idea even less.

On the other hand, I also don't have an idea that I do like right now,
so it's probably too early to decide anything just yet. I'll try to
find more time to study this (and I hope others do the same).

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Next
From: Melanie Plageman
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae