Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id 20221114223307.e6vz2hbzshbry5rg@awork3.anarazel.de
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers
Hi,

On 2022-11-14 14:13:10 -0800, Peter Geoghegan wrote:
> > I think the problem partially is that the proposed verify_heapam() code is too
> > "aggressive" considering things to be part of the same hot chain - which then
> > means we have to be very careful about erroring out.
> >
> > The attached isolationtester test triggers:
> > "unfrozen tuple was updated to produce a tuple at offset %u which is frozen"
> > "updated version at offset 3 is also the updated version of tuple at offset %u"
> >
> > Despite there afaict not being any corruption. Worth noting that this happens
> > regardless of hot/non-hot updates being used (uncomment s3ci to see).
> 
> Why don't you think that there is corruption?

I looked at the state after the test and the complaint is bogus. It's caused
by the patch ignoring the cur->xmax == next->xmin condition if next->xmin is
FrozenTransactionId. The isolationtester test creates a situation where that
leads to verify_heapam() considering tuples to be part of the same chain even
though they aren't.


> Because I feel like I'm repeating myself more than I should, but: why isn't
> it as simple as "HOT chain traversal logic is broken by frozen xmin in the
> obvious way, therefore all bets are off"?

Because that's irrelevant for the testcase and a good number of my concerns.


> Maybe you're right about the proposed new functionality getting things wrong
> with your adversarial isolation test, but I seem to have missed the
> underlying argument. Are you just talking about regular update chains here,
> not HOT chains? Something else?

As I noted, it happens regardless of HOT being used or not. The tuples aren't
part of the same chain, but the patch treats them as if they were.  The reason
the patch considers them to be part of the same chain is precisely the
FrozenTransactionId condition I was worried about. Just because t_ctid points
to a tuple on the same page and the next tuple has xmin ==
FrozenTransactionId, doesn't mean they're part of the same chain. Once you
encounter a tuple with a frozen xmin you simply cannot assume it's part of the
chain you've been following.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: Andrew Dunstan
Date:
Subject: meson oddities