Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date
Msg-id CAH2-WznQ2ZXCCnLRL0o-qR7Xe8mcbYo2kMjK7FMG-MxiyvFD1Q@mail.gmail.com
Whole thread Raw
In response to Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-bugs
On Fri, Mar 29, 2024 at 12:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
> That means that heap_vacuum_rel() isn't guaranteeing anything by
> calling vacuum_get_cutoffs() before it calls GlobalVisTestFor(). I
> started wondering when this bug got introduced, and concluded that it
> probably dates all the way back to
> dc7420c2c9274a283779ec19718d2d16323640c0, when Andres first introduced
> the GlobalVisTest stuff. The code was substantially refactored
> afterwards by Peter Geoghegan, in
> b4af70cb210393c9c8f41643acf6b213e21178e7 and
> 73f6ec3d3c8d5786c54373e71a096e5acf78e7ca. However, as far as I can
> see, neither of those commits created the problem; they just moved
> code around, and in fact I'd say that the problem is easier to see now
> than it was before all of that refactoring because the two problematic
> initializations are now closer together than they were originally. I
> think that Peter may have been under the impression at the time of the
> latter commit that the order of initializing vacrel->cutoffs vs.
> vacrel->vistest was significant, because the commit message says:

FWIW I never thought that the order that we called
vacuum_get_cutoffs() relative to when we call GlobalVisTestFor() was
directly significant (though I did think that about the order that we
attain VACUUM's rel_pages and the vacuum_get_cutoffs() call). I can't
have thought that, because clearly GlobalVisTestFor() just returns a
pointer, and so cannot directly affect backend local state.

>     Explain the relationship between vacuumlazy.c's vistest and OldestXmin
>     cutoffs.  These closely related cutoffs are different in subtle but
>     important ways.  Also document a closely related rule: we must establish
>     rel_pages _after_ OldestXmin to ensure that no XID < OldestXmin can be
>     missed by lazy_scan_heap().
>
> And a comment in that commit says:
>
> +     * We expect vistest will always make heap_page_prune remove any deleted
> +     * tuple whose xmax is < OldestXmin.  lazy_scan_prune must never become
>
> Neither of these is perhaps 100% explicit about the ordering being
> relevant, but if that wasn't thought to make this code safe, then I
> don't know what was thought to make it safe.

I intended to highlight the general nature of the dependency  --
nothing more. I don't claim to be an authoritative source on this,
though I was confident that the code was *supposed* to work like this
(how else could it possibly work?).

It was clear that this is an important issue, from an early stage.
Pre-release 14 had 2 or 3 bugs that all had the same symptom:
lazy_scan_prune would loop forever. This was true even though each of
the bugs had fairly different underlying causes (all tied to
dc7420c2c). I figured that there might well be more bugs like that in
the future.

> At any rate, unless I'm missing something, which is definitely
> possible, the actual bug dates to dc7420c2c, which means whatever fix
> we come up with here will need to be back-patched back to v14.

I agree.

> I kind of feel like the code is just going at this in a way that's
> completely backwards. It feels like what we should be doing is looking
> at the actual states of the pages post-pruning, and specifically what
> the oldest (M)XIDs that are really present there are. But changing
> that seems like a lot to do in a fix that needs to be back-patched.

I said the same thing recently, to Noah. This is still something that
ought to be made very explicit -- pruneheap.c should be directly aware
of its responsibility to not leave any XID < VACUUM's OldestXmin
behind, except for any XIDs that only freezing can deal with. (We're
moving in the direction of making pruneheap.c responsible for
freezing, so perhaps my caveat about freezing shouldn't apply on
HEAD.)

> It's tempting to look for a narrower fix, where we just find out the
> oldest horizon that was ever actually used. But I don't feel very good
> about doing that by tampering with ComputeXidHorizons(), either.

I have every reason to believe that the remaining problems in this
area are extremely rare. I wonder if it would make sense to focus on
making the infinite loop behavior in lazy_scan_prune just throw an
error.

I now fear that that'll be harder than one might think. At the time
that I added the looping behavior (in commit 8523492d), I believed
that the only "legitimate" reason that it could ever be needed was the
same reason why we needed the old tupgone behavior (to deal with
concurrently-inserted tuples from transactions that abort in flight).
But now I worry that it's actually protective, in some way that isn't
generally understood. And so it might be that converting the retry
into a hard error (e.g., erroring-out after MaxHeapTuplesPerPage
retries) will create new problems.

--
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Next
From: Robert Haas
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae