Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: HOT chain validation in verify_heapam() |
Date | |
Msg-id | CAH2-Wz=_wfUq=Nwd9Y0g4kz1_o8_2Lus7EG-VqOAsUJt_fGBxg@mail.gmail.com Whole thread Raw |
In response to | Re: HOT chain validation in verify_heapam() (Andres Freund <andres@anarazel.de>) |
Responses |
Re: HOT chain validation in verify_heapam()
|
List | pgsql-hackers |
On Wed, Nov 9, 2022 at 2:08 PM Andres Freund <andres@anarazel.de> wrote: > To start with: I think this is an extremely helpful and important > feature. Both for checking production systems and for finding problems during > development. +1. It's painful to get this in, in part because we now have to actually decide what the rules really are with total precision, including for all of the tricky edge cases. The exercise of writing this code should "keep us honest" about whether or not we really know what the invariants are, which is more than half the battle. > > + /* > > + * Add a line pointer offset to the predecessor array if xmax is > > + * matching with xmin of next tuple (reaching via its t_ctid). > > + * Prior to PostgreSQL 9.4, we actually changed the xmin to > > + * FrozenTransactionId > > I'm doubtful it's a good idea to try to validate the 9.4 case. The likelihood > of getting that right seems low and I don't see us gaining much by even trying. This is the kind of comment that I'd usually agree with, but I disagree in this instance because of special considerations that apply to amcheck (or should IMV apply, at least). We're living in a world where we have to assume that the pre-9.4 format can occur in the field. If we can't get it right in amcheck, what chance do we have with other new code that tickles the same areas? I think that we need to support obsolescent heapam representations (both FrozenTransactionId and xvac) here on general principle. Besides, why not accept some small chance of getting this wrong? The worst that can happen is that we'll have a relatively benign bug. If we get it wrong then it's a short term problem, but also an opportunity to be less wrong in the future -- including in places where the consequences of being wrong are much more serious. > I doubt it is correct to enter this path with next_xmin == > FrozenTransactionId. This is following a ctid chain that we normally wouldn't > follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin condition. We should never see FrozenTransactionId in an xmax field (nor should it be returned by HeapTupleHeaderGetUpdateXid() under any circumstances). We can "freeze xmax" during VACUUM, but that actually means setting xmax to InvalidTransactionId (in rare cases it might mean replacing a Multi with a new Multi). The terminology in this area is a bit tricky. Anyway, it follows that we cannot expect "next_xmin == FrozenTransactionId", because that would mean that we'd called HeapTupleHeaderGetUpdateXid() which returned FrozenTransactionId -- an impossibility. (Maybe we should be checking that it really is an impossibility by checking the HeapTupleHeaderGetUpdateXid() return value, but that should be enough.) > I don't immediately see what prevents the frozen tuple being from an entirely > different HOT chain than the two tuples pointing to it. In my view it simply isn't possible for a valid HOT chain to be in this state in the first place. So by definition it wouldn't be a HOT chain. That would be a form of corruption, which is something that would probably be detected by noticing orphaned heap-only tuples (heap-only tuples not reachable from some root item on the same page, or some other intermediary heap-only tuple reachable from a root item). > Can't we have a an update chain that is e.g. > xmin 10, xmax 5 -> xmin 5, xmax invalid > > and a vacuum cutoff of 7? That'd preent the first tuple from being removed, > but would allow 5 to be frozen. I don't see how that can be possible. That is contradictory, and cannot possibly work, since it supposes a situation where every possible MVCC snapshot sees the update that generated the second/successor tuple as committed, while at the same time also somehow needing the original tuple to stay in place. Surely both things can never be true at the same time. I believe you're right that an update chain that looks like this one is possible. However, I don't think it's possible for OldestXmin/FreezeLimit to take on a value like that (i.e. a value that "skewers" the update chain like this, the value 7 from your example). We ought to be able to rely on an OldestXmin value that can never let such a situation emerge. Right? -- Peter Geoghegan
pgsql-hackers by date: