Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id 20221114205806.c54z6e742scdgf33@awork3.anarazel.de
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers
Hi,

On 2022-11-14 09:50:48 -0800, Peter Geoghegan wrote:
> On Mon, Nov 14, 2022 at 9:38 AM Andres Freund <andres@anarazel.de> wrote:
> > Anyway, I played a bit around with this. It's hard to hit, not because we
> > somehow won't choose such a horizon, but because we'll commonly prune the
> > earlier tuple version away due to xmax being old enough.
>
> That must be a bug, then. Since, as I said, I can't see how it could
> possibly be okay to freeze an xmin of tuple in a HOT chain without
> also making sure that it has no earlier versions left behind.

Hard to imagine us having bugs in this code. Ahem.

I really wish I knew of a reasonably complex way to utilize coverage guided
fuzzing on heap pruning / vacuuming.


I wonder if we ought to add an error check to heap_prepare_freeze_tuple()
against this scenario. We're working towards being more aggressive around
freezing, which will make it more likely to hit corner cases around this.



> If there are earlier versions that we have to go through to get to the
> frozen-xmin tuple (not just an LP_REDIRECT), we're going to break the HOT
> chain traversal logic in code like heap_hot_search_buffer in a rather
> obvious way.
>
> HOT chain traversal logic code will interpret the frozen xmin from the
> tuple as FrozenTransactionId (not as its raw xmin). So traversal is
> just broken in this scenario.
>

Which'd still be fine if the whole chain were already "fully dead". One way I
think this can happen is <= PG 13's HEAPTUPLE_DEAD handling in
lazy_scan_heap().

I now suspect that the seemingly-odd "We will advance past RECENTLY_DEAD
tuples just in case there's a DEAD one after them;" logic in
heap_prune_chain() might be required for correctness.  Which IIRC we'd been
talking about getting rid elsewhere?

<tinkers>

At least as long as correctness requires not ending up in endless loops -
indeed. We end up with lazy_scan_prune() endlessly retrying. Without a chance
to interrupt. Shouldn't there at least be a CFI somewhere?  The attached
isolationtester spec has a commented out test for this.


I think the problem partially is that the proposed verify_heapam() code is too
"aggressive" considering things to be part of the same hot chain - which then
means we have to be very careful about erroring out.

The attached isolationtester test triggers:
"unfrozen tuple was updated to produce a tuple at offset %u which is frozen"
"updated version at offset 3 is also the updated version of tuple at offset %u"

Despite there afaict not being any corruption. Worth noting that this happens
regardless of hot/non-hot updates being used (uncomment s3ci to see).


> > It *is* possible to
> > hit, if the horizon increases between the two tuple version checks (e.g. if
> > there's another tuple inbetween that we check the visibility of).
>
> I suppose that that's the detail that "protects" us, then -- that
> would explain the apparent lack of problems in the field. Your
> sequence requires 3 sessions, not just 2.

One important protection right now is that vacuumlazy.c uses a more
pessimistic horizon than pruneheap.c. Even if visibility determinations within
pruning recompute the horizon, vacuumlazy.c won't freeze based on the advanced
horizon.  I don't quite know where we we'd best put a comment with a warning
about this fact.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Suppressing useless wakeups in walreceiver
Next
From: Tom Lane
Date:
Subject: Re: Add sub-transaction overflow status in pg_stat_activity