Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CAH2-WznBBTm10chJN-OjPJHLEYOsNXUdh8kW0uV4vfHcR3K2TQ@mail.gmail.com
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Andres Freund <andres@anarazel.de>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers
On Mon, Nov 14, 2022 at 2:33 PM Andres Freund <andres@anarazel.de> wrote:
> > Why don't you think that there is corruption?
>
> I looked at the state after the test and the complaint is bogus. It's caused
> by the patch ignoring the cur->xmax == next->xmin condition if next->xmin is
> FrozenTransactionId. The isolationtester test creates a situation where that
> leads to verify_heapam() considering tuples to be part of the same chain even
> though they aren't.

Having looked at your isolation test in more detail, it seems like you
were complaining about a fairly specific and uncontroversial
shortcoming in the patch itself: it complains about a newly inserted
tuple that gets frozen. It thinks that the inserted tuple is part of
the same HOT chain (or at least the same update chain) as other tuples
on the same heap page, when in fact it's just some wholly unrelated
tuple/logical row. It seems as if the new amcheck code doesn't get all
the details of validating HOT chains right, and so jumps the gun here,
reporting corruption based on a faulty assumption that the frozen-xmin
tuple is in any way related to the chain.

I was confused about whether we were talking about this patch, bugs in
HEAD, or both.

> > Maybe you're right about the proposed new functionality getting things wrong
> > with your adversarial isolation test, but I seem to have missed the
> > underlying argument. Are you just talking about regular update chains here,
> > not HOT chains? Something else?
>
> As I noted, it happens regardless of HOT being used or not. The tuples aren't
> part of the same chain, but the patch treats them as if they were.  The reason
> the patch considers them to be part of the same chain is precisely the
> FrozenTransactionId condition I was worried about. Just because t_ctid points
> to a tuple on the same page and the next tuple has xmin ==
> FrozenTransactionId, doesn't mean they're part of the same chain. Once you
> encounter a tuple with a frozen xmin you simply cannot assume it's part of the
> chain you've been following.

Got it.

That seems relatively straightforward and uncontroversial to me,
because it's just how code like heap_hot_search_buffer (HOT chain
traversal code) works already. The patch got some of those details
wrong, and should be revised.

What does this have to tell us, if anything, about the implications
for code on HEAD? I don't see any connection between this problem and
the possibility of a live bug on HEAD involving freezing later tuple
versions in a HOT chain, leaving earlier non-frozen versions behind to
break HOT chain traversal code. Should I have noticed such a
connection?

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: meson oddities
Next
From: Andres Freund
Date:
Subject: Re: HOT chain validation in verify_heapam()