Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id 20221110014607.dlqzedl33csz36x2@awork3.anarazel.de
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: HOT chain validation in verify_heapam()
Re: HOT chain validation in verify_heapam()
List pgsql-hackers
Hi,

On 2022-11-09 17:32:46 -0800, Peter Geoghegan wrote:
> > The xmin horizon is very coarse grained. Just because it is 7 doesn't mean
> > that xid 10 is still running. All it means that one backend or slot has an
> > xmin or xid of 7.
> 
> Of course that's true. But I wasn't talking about the general case --
> I was talking about your "xmin 10, xmax 5 -> xmin 5, xmax invalid"
> update chain case specifically, with its "skewered" OldestXmin of 7.

The sequence below produces such an OldestXmin:

> > s1: acquire xid 5
> > s2: acquire xid 7
> > s3: acquire xid 10
> >
> > s3: insert
> > s3: commit
> > s1: update
> > s1: commit
> >
> > s2: get a new snapshot, xmin 7 (or just hold no snapshot)
> >
> > At this point the xmin horizon is 7. The first tuple's xmin can't be
> > frozen. The second tuple's xmin can be.
> 
> Basically what I'm saying about OldestXmin is that it ought to "work
> transitively", from the updater to the inserter that inserted the
> now-updated tuple. That is, the OldestXmin should either count both
> XIDs that appear in the update chain, or neither XID.

It doesn't work that way. The above sequence shows one case where it doesn't.


> > > I believe you're right that an update chain that looks like this one
> > > is possible. However, I don't think it's possible for
> > > OldestXmin/FreezeLimit to take on a value like that (i.e. a value that
> > > "skewers" the update chain like this, the value 7 from your example).
> > > We ought to be able to rely on an OldestXmin value that can never let
> > > such a situation emerge. Right?
> >
> > I don't see anything that'd guarantee that currently, nor do immediately see a
> > possible way to get there.
> >
> > What do you think prevents such an OldestXmin?
> 
> ComputeXidHorizons() computes VACUUM's OldestXmin (actually it
> computes h->data_oldest_nonremovable values) by scanning the proc
> array. And counts PGPROC.xmin from each running xact. So ultimately
> the inserter and updater are tied together by that. It's either an
> OldestXmin that includes both, or one that includes neither.

> Here are some facts that I think we both agree on already:
> 
> 1. It is definitely possible to have an update chain like your "xmin
> 10, xmax 5 -> xmin 5, xmax invalid" example.
> 
> 2. It is definitely not possible to "freeze xmax" by setting its value
> to FrozenTransactionId or something similar -- there is simply no code
> path that can do that, and never has been. (The term "freeze xmax" is
> a bit ambiguous, though it usually means set xmax to
> InvalidTransactionId.)
> 
> 3. There is no specific reason to believe that there is a live bug here.

I don't think there's a live bug here. I think the patch isn't dealing
correctly with that issue though.


> Putting all 3 together: doesn't it seem quite likely that the way that
> we compute OldestXmin is the factor that prevents "skewering" of an
> update chain? What else could possibly be preventing corruption here?
> (Theoretically it might never have been discovered, but that seems
> pretty hard to believe.)

I don't see how that follows. The existing code is just ok with that. In fact
we have explicit code trying to exploit this:

        /*
         * If the DEAD tuple is at the end of the chain, the entire chain is
         * dead and the root line pointer can be marked dead.  Otherwise just
         * redirect the root to the correct chain member.
         */
        if (i >= nchain)
            heap_prune_record_dead(prstate, rootoffnum);
        else
            heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: Andres Freund
Date:
Subject: Re: HOT chain validation in verify_heapam()