Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CAH2-Wzk8OYkca7af+dGNGdMATn5d_dqZEsk=e45BndvjQGcDuQ@mail.gmail.com
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: HOT chain validation in verify_heapam()  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Mar 22, 2023 at 2:14 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set
> > and expects to find an item it can dereference - but I don't think that's
> > something we can rely on: Afaics HOT pruning can break chains, but doesn't
> > reset xmax.
>
> I think that we need two passes to be completely thorough. An initial
> pass, that works pretty much as-is, plus a second pass that locates
> any orphaned heap-only tuples -- heap-only tuples that were not deemed
> part of a valid HOT chain during the first pass. These remaining
> orphaned heap-only tuples should be verified as having come from
> now-aborted transactions (they should definitely be fully DEAD) --
> otherwise we have corruption.

I see that there is a second pass over the heap page in
verify_heapam(), in fact. Kind of. I'm referring to this loop:

        /*
         * An update chain can start either with a non-heap-only tuple or with
         * a redirect line pointer, but not with a heap-only tuple.
         *
         * (This check is in a separate loop because we need the predecessor
         * array to be fully populated before we can perform it.)
         */
        for (ctx.offnum = FirstOffsetNumber;
             ctx.offnum <= maxoff;
             ctx.offnum = OffsetNumberNext(ctx.offnum))
        {
            if (xmin_commit_status_ok[ctx.offnum] &&
                (xmin_commit_status[ctx.offnum] == XID_COMMITTED ||
                 xmin_commit_status[ctx.offnum] == XID_IN_PROGRESS) &&
                predecessor[ctx.offnum] == InvalidOffsetNumber)
            {
                ItemId      curr_lp;

                curr_lp = PageGetItemId(ctx.page, ctx.offnum);
                if (!ItemIdIsRedirected(curr_lp))
                {
                    HeapTupleHeader curr_htup;

                    curr_htup = (HeapTupleHeader)
                        PageGetItem(ctx.page, curr_lp);
                    if (HeapTupleHeaderIsHeapOnly(curr_htup))
                        report_corruption(&ctx,
                                          psprintf("tuple is root of
chain but is marked as heap-only tuple"));
                }
            }
        }


However, this "second pass over page" loop has roughly the same
problem as the nearby HeapTupleHeaderIsHotUpdated() coding pattern: it
doesn't account for the fact that a tuple whose xmin was
XID_IN_PROGRESS a little earlier on may not be in that state once we
reach the second pass loop. Concurrent transaction abort needs to be
accounted for. The loop needs to recheck xmin status, at least in the
initially-XID_IN_PROGRESS-xmin case.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "Imseih (AWS), Sami"
Date:
Subject: Re: [BUG] pg_stat_statements and extended query protocol
Next
From: "David G. Johnston"
Date:
Subject: Re: Options to rowwise persist result of stable/immutable function with RECORD result