Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CAH2-WzkysLDKF_R3qA=nhOXL4JuCqTxHCFw-7oqASuQ56vHtAg@mail.gmail.com
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Andres Freund <andres@anarazel.de>)
Responses Re: HOT chain validation in verify_heapam()  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Mar 22, 2023 at 1:45 PM Andres Freund <andres@anarazel.de> wrote:
> At the very least there's missing verification that tids actually exists in the
> "Update chain validation" loop, leading to:
> TRAP: failed Assert("ItemIdHasStorage(itemId)"), File:
"../../../../home/andres/src/postgresql/src/include/storage/bufpage.h",Line: 355, PID: 645093 
>
> Which makes sense - the earlier loop adds t_ctid to the successor array, which
> we then query without checking if there still is such a tid on the page.
>
> I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the
> max offset on the page.

We definitely need to do it that way, since a heap-only tuple's t_ctid
is allowed to point to almost anything. I guess it can't point to some
completely different heap block, but that's about the only
restriction. In particular, it can point to an item that's past the
end of the page following line pointer array truncation (truncation
can happen during pruning or when the second heap pass takes place in
VACUUM).

OTOH the rules for LP_REDIRECT items *are* very strict. They need to
be, since it's the root item of the HOT chain, referenced by TIDs in
indexes, and have no heap tuple header metadata to use in cross-checks
that take place during HOT chain traversal (traversal by code in
places such as heap_hot_search_buffer).

> WRT these failures:
> non-heap-only update produced a heap-only tuple at offset 20
>
> I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s
> definition:

That has to be a problem for verify_heapam.

> Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set
> and expects to find an item it can dereference - but I don't think that's
> something we can rely on: Afaics HOT pruning can break chains, but doesn't
> reset xmax.

I think that we need two passes to be completely thorough. An initial
pass, that works pretty much as-is, plus a second pass that locates
any orphaned heap-only tuples -- heap-only tuples that were not deemed
part of a valid HOT chain during the first pass. These remaining
orphaned heap-only tuples should be verified as having come from
now-aborted transactions (they should definitely be fully DEAD) --
otherwise we have corruption.

That's what my abandoned patch to make heap pruning more robust did,
you'll recall.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: "Euler Taveira"
Date:
Subject: Re: Initial Schema Sync for Logical Replication