On Fri, Mar 29, 2024 at 1:17 PM Peter Geoghegan <pg@bowt.ie> wrote:
> FWIW I never thought that the order that we called
> vacuum_get_cutoffs() relative to when we call GlobalVisTestFor() was
> directly significant (though I did think that about the order that we
> attain VACUUM's rel_pages and the vacuum_get_cutoffs() call). I can't
> have thought that, because clearly GlobalVisTestFor() just returns a
> pointer, and so cannot directly affect backend local state.
Hmm, OK.
> It was clear that this is an important issue, from an early stage.
> Pre-release 14 had 2 or 3 bugs that all had the same symptom:
> lazy_scan_prune would loop forever. This was true even though each of
> the bugs had fairly different underlying causes (all tied to
> dc7420c2c). I figured that there might well be more bugs like that in
> the future.
Looks like you were right.
> I have every reason to believe that the remaining problems in this
> area are extremely rare. I wonder if it would make sense to focus on
> making the infinite loop behavior in lazy_scan_prune just throw an
> error.
>
> I now fear that that'll be harder than one might think. At the time
> that I added the looping behavior (in commit 8523492d), I believed
> that the only "legitimate" reason that it could ever be needed was the
> same reason why we needed the old tupgone behavior (to deal with
> concurrently-inserted tuples from transactions that abort in flight).
> But now I worry that it's actually protective, in some way that isn't
> generally understood. And so it might be that converting the retry
> into a hard error (e.g., erroring-out after MaxHeapTuplesPerPage
> retries) will create new problems.
It also sounds like it would boil down to "ERROR: our code sucks", so
count me as not a fan of that approach. As much as I don't like the
idea of significant changes to the back-branches, I think I like that
idea even less.
On the other hand, I also don't have an idea that I do like right now,
so it's probably too early to decide anything just yet. I'll try to
find more time to study this (and I hope others do the same).
--
Robert Haas
EDB: http://www.enterprisedb.com