Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: HOT chain validation in verify_heapam() |
Date | |
Msg-id | 20221114205806.c54z6e742scdgf33@awork3.anarazel.de Whole thread Raw |
In response to | Re: HOT chain validation in verify_heapam() (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: HOT chain validation in verify_heapam()
|
List | pgsql-hackers |
Hi, On 2022-11-14 09:50:48 -0800, Peter Geoghegan wrote: > On Mon, Nov 14, 2022 at 9:38 AM Andres Freund <andres@anarazel.de> wrote: > > Anyway, I played a bit around with this. It's hard to hit, not because we > > somehow won't choose such a horizon, but because we'll commonly prune the > > earlier tuple version away due to xmax being old enough. > > That must be a bug, then. Since, as I said, I can't see how it could > possibly be okay to freeze an xmin of tuple in a HOT chain without > also making sure that it has no earlier versions left behind. Hard to imagine us having bugs in this code. Ahem. I really wish I knew of a reasonably complex way to utilize coverage guided fuzzing on heap pruning / vacuuming. I wonder if we ought to add an error check to heap_prepare_freeze_tuple() against this scenario. We're working towards being more aggressive around freezing, which will make it more likely to hit corner cases around this. > If there are earlier versions that we have to go through to get to the > frozen-xmin tuple (not just an LP_REDIRECT), we're going to break the HOT > chain traversal logic in code like heap_hot_search_buffer in a rather > obvious way. > > HOT chain traversal logic code will interpret the frozen xmin from the > tuple as FrozenTransactionId (not as its raw xmin). So traversal is > just broken in this scenario. > Which'd still be fine if the whole chain were already "fully dead". One way I think this can happen is <= PG 13's HEAPTUPLE_DEAD handling in lazy_scan_heap(). I now suspect that the seemingly-odd "We will advance past RECENTLY_DEAD tuples just in case there's a DEAD one after them;" logic in heap_prune_chain() might be required for correctness. Which IIRC we'd been talking about getting rid elsewhere? <tinkers> At least as long as correctness requires not ending up in endless loops - indeed. We end up with lazy_scan_prune() endlessly retrying. Without a chance to interrupt. Shouldn't there at least be a CFI somewhere? The attached isolationtester spec has a commented out test for this. I think the problem partially is that the proposed verify_heapam() code is too "aggressive" considering things to be part of the same hot chain - which then means we have to be very careful about erroring out. The attached isolationtester test triggers: "unfrozen tuple was updated to produce a tuple at offset %u which is frozen" "updated version at offset 3 is also the updated version of tuple at offset %u" Despite there afaict not being any corruption. Worth noting that this happens regardless of hot/non-hot updates being used (uncomment s3ci to see). > > It *is* possible to > > hit, if the horizon increases between the two tuple version checks (e.g. if > > there's another tuple inbetween that we check the visibility of). > > I suppose that that's the detail that "protects" us, then -- that > would explain the apparent lack of problems in the field. Your > sequence requires 3 sessions, not just 2. One important protection right now is that vacuumlazy.c uses a more pessimistic horizon than pruneheap.c. Even if visibility determinations within pruning recompute the horizon, vacuumlazy.c won't freeze based on the advanced horizon. I don't quite know where we we'd best put a comment with a warning about this fact. Greetings, Andres Freund
Attachment
pgsql-hackers by date: