Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id 20221114220252.wahp6lp5hid5ty5k@awork3.anarazel.de
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: HOT chain validation in verify_heapam()
Re: HOT chain validation in verify_heapam()
List pgsql-hackers
Hi,

On 2022-11-14 14:27:54 -0500, Robert Haas wrote:
> On Wed, Nov 9, 2022 at 5:08 PM Andres Freund <andres@anarazel.de> wrote:
> > I don't really understand this logic - why can't we populate the predecessor
> > array, if we can construct a successor entry?
> 
> This whole thing was my idea, so let me try to explain. I think the
> naming and comments need work, but I believe the fundamental idea may
> be sound.
> 
> successor[x] = y means that when we looked at line pointer x, we saw
> that it was either a redirect to line pointer y, or else it had
> storage and the associated tuple's CTID pointed to line pointer y.

> At this point, we do not have any idea whether y is at all sane, nor we do
> we know anything about which of x and y is larger.

What do you mean with "larger" here?


> Furthermore, it is
> possible that successor[x] = successor[x'] since the page might be corrupted
> and we haven't checked otherwise.
> 
> predecessor[y] = x means that successor[x] = y but in addition we've
> checked that y is sane, and that x.xmax=y.xmin. If there are multiple
> tuples for which these conditions hold, we've issued complaints about
> all but one and entered the last into the predecessor array.

As shown by the isolationtester test I just posted, this doesn't quite work
right now. Probably fixable.

I don't think we can follow non-HOT ctid chains if they're older than the xmin
horizon, including all cases of xmin being frozen. There's just nothing
guaranteeing that the tuples are actually "related".


It seems like we should do a bit more validation within a chain of
tuples. E.g. that no live tuple can follow an !DidCommit xmin?



> > I'm doubtful it's a good idea to try to validate the 9.4 case. The likelihood
> > of getting that right seems low and I don't see us gaining much by even trying.
> 
> I agree with Peter. We have to try to get that case right. If we can
> eventually eliminate it as a valid case by some mechanism, hooray. But
> in the meantime we have to deal with it as best we can.

I now think that the 9.4 specific reasoning is bogus in the first place. The
patch says:

             * Add a line pointer offset to the predecessor array if xmax is
             * matching with xmin of next tuple (reaching via its t_ctid).
             * Prior to PostgreSQL 9.4, we actually changed the xmin to
             * FrozenTransactionId so we must add offset to predecessor
             * array(irrespective of xmax-xmin matching) if updated tuple xmin
             * is frozen, so that we can later do validation related to frozen
             * xmin. Raise corruption if we have two tuples having the same
             * predecessor.

but it's simply not correct to iterate through xmin=FrozenTransactionId - as
shown in the isolationtester test. And that's unrelated to 9.4, because we
couldn't rely on the raw xmin value either, because even if they match, they
could be from different epochs.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: Andres Freund
Date:
Subject: Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment