Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id 20230323172607.y3lejpntjnuis5vv@awork3.anarazel.de
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: HOT chain validation in verify_heapam()  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2023-03-23 11:41:52 -0400, Robert Haas wrote:
> On Wed, Mar 22, 2023 at 5:56 PM Andres Freund <andres@anarazel.de> wrote:
> > I also think it's not quite right that some of checks inside if
> > (ItemIdIsRedirected()) continue in case of corruption, others don't. While
> > there's a later continue, that means the corrupt tuples get added to the
> > predecessor array.  Similarly, in the non-redirect portion, the successor
> > array gets filled with corrupt tuples, which doesn't seem quite right to me.
> 
> I'm not entirely sure I agree with this. I mean, we should skip
> further checks of the data is so corrupt that further checks are not
> sensible e.g. if the line pointer is just gibberish we can't validate
> anything about the tuple, because we can't even find the tuple.
> However, we don't want to skip further checks as soon as we see any
> kind of a problem whatsoever. For instance, even if the tuple data is
> utter gibberish, that should not and does not keep us from checking
> whether the update chain looks sane. If the tuple header is garbage
> (e.g. xmin and xmax are outside the bounds of clog) then at least some
> of the update-chain checks are not possible, because we can't know the
> commit status of the tuples, but garbage in the tuple data isn't
> really a problem for update chain validation per se.

The cases I was complaining about where metadata that's important. We
shouldn't enter a tuple into lp_valid[] or successor[] if it failed validity
checks - the subsequent error reports that that generates won't be helpful -
or we'll crash.

E.g. continuing after:

                rditem = PageGetItemId(ctx.page, rdoffnum);
                if (!ItemIdIsUsed(rditem))
                    report_corruption(&ctx,
                                      psprintf("line pointer redirection to unused item at offset %u",
                                               (unsigned) rdoffnum));

means we'll look into the tuple in the "update chain validation" loop for
unused items. Where it afaict will lead to a crash or bogus results, because:
                /* Can only redirect to a HOT tuple. */
                next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);
                if (!HeapTupleHeaderIsHeapOnly(next_htup))
                {
                    report_corruption(&ctx,
                                      psprintf("redirected line pointer points to a non-heap-only tuple at offset %u",
                                               (unsigned) nextoffnum));
                }

will just dereference the unused item.



> -                 * Redirects are created by updates, so successor should be
> -                 * the result of an update.
> +                 * Redirects are created by HOT updates, so successor should
> +                 * be the result of an HOT update.
> +                 *
> +                 * XXX: HeapTupleHeaderIsHeapOnly() should always imply
> +                 * HEAP_UPDATED. This should be checked even when the tuple
> +                 * isn't a target of a redirect.
> 
> Hmm, OK. So the question is where to put this check. Maybe inside
> check_tuple_header(), making it independent of the update chain
> validation stuff?

Yes, check_tuple_header sounds sensible to me.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Add pg_walinspect function with block info columns
Next
From: Bharath Rupireddy
Date:
Subject: Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)