Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CAH2-Wz=_wfUq=Nwd9Y0g4kz1_o8_2Lus7EG-VqOAsUJt_fGBxg@mail.gmail.com
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Andres Freund <andres@anarazel.de>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers
On Wed, Nov 9, 2022 at 2:08 PM Andres Freund <andres@anarazel.de> wrote:
> To start with: I think this is an extremely helpful and important
> feature. Both for checking production systems and for finding problems during
> development.

+1.

It's painful to get this in, in part because we now have to actually
decide what the rules really are with total precision, including for
all of the tricky edge cases. The exercise of writing this code should
"keep us honest" about whether or not we really know what the
invariants are, which is more than half the battle.

> > +                     /*
> > +                      * Add a line pointer offset to the predecessor array if xmax is
> > +                      * matching with xmin of next tuple (reaching via its t_ctid).
> > +                      * Prior to PostgreSQL 9.4, we actually changed the xmin to
> > +                      * FrozenTransactionId
>
> I'm doubtful it's a good idea to try to validate the 9.4 case. The likelihood
> of getting that right seems low and I don't see us gaining much by even trying.

This is the kind of comment that I'd usually agree with, but I
disagree in this instance because of special considerations that apply
to amcheck (or should IMV apply, at least). We're living in a world
where we have to assume that the pre-9.4 format can occur in the
field. If we can't get it right in amcheck, what chance do we have
with other new code that tickles the same areas? I think that we need
to support obsolescent heapam representations (both
FrozenTransactionId and xvac) here on general principle.

Besides, why not accept some small chance of getting this wrong? The
worst that can happen is that we'll have a relatively benign bug. If
we get it wrong then it's a short term problem, but also an
opportunity to be less wrong in the future -- including in places
where the consequences of being wrong are much more serious.

> I doubt it is correct to enter this path with next_xmin ==
> FrozenTransactionId. This is following a ctid chain that we normally wouldn't
> follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin condition.

We should never see FrozenTransactionId in an xmax field (nor should
it be returned by HeapTupleHeaderGetUpdateXid() under any
circumstances). We can "freeze xmax" during VACUUM, but that actually
means setting xmax to InvalidTransactionId (in rare cases it might
mean replacing a Multi with a new Multi). The terminology in this area
is a bit tricky.

Anyway, it follows that we cannot expect "next_xmin ==
FrozenTransactionId", because that would mean that we'd called
HeapTupleHeaderGetUpdateXid() which returned FrozenTransactionId -- an
impossibility. (Maybe we should be checking that it really is an
impossibility by checking the HeapTupleHeaderGetUpdateXid() return
value, but that should be enough.)

> I don't immediately see what prevents the frozen tuple being from an entirely
> different HOT chain than the two tuples pointing to it.

In my view it simply isn't possible for a valid HOT chain to be in
this state in the first place. So by definition it wouldn't be a HOT
chain. That would be a form of corruption, which is something that
would probably be detected by noticing orphaned heap-only tuples
(heap-only tuples not reachable from some root item on the same page,
or some other intermediary heap-only tuple reachable from a root
item).

> Can't we have a an update chain that is e.g.
> xmin 10, xmax 5 -> xmin 5, xmax invalid
>
> and a vacuum cutoff of 7? That'd preent the first tuple from being removed,
> but would allow 5 to be frozen.

I don't see how that can be possible. That is contradictory, and
cannot possibly work, since it supposes a situation where every
possible MVCC snapshot sees the update that generated the
second/successor tuple as committed, while at the same time also
somehow needing the original tuple to stay in place. Surely both
things can never be true at the same time.

I believe you're right that an update chain that looks like this one
is possible. However, I don't think it's possible for
OldestXmin/FreezeLimit to take on a value like that (i.e. a value that
"skewers" the update chain like this, the value 7 from your example).
We ought to be able to rely on an OldestXmin value that can never let
such a situation emerge. Right?

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: "David G. Johnston"
Date:
Subject: Re: [DOCS] Stats views and functions not in order?