Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-deadbug (Was: amcheck (B-Tree integrity checking tool)) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-deadbug (Was: amcheck (B-Tree integrity checking tool))
Date
Msg-id CAH2-Wz=3h22nWOx4OZRngVjbjgEP8P7QEgytaoXo9zoQ_=z=fA@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-deadbug (Was: amcheck (B-Tree integrity checking tool))  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Oct 18, 2017 at 12:45 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> Bringing it back to the concrete freeze-the-dead issue, and the
> question of an XID-cutoff for safely interrogating CLOG: I guess it
> will be possible to assess a HOT chain as a whole. We can probably do
> this safely while holding a super-exclusive lock on the buffer. I can
> probably find a way to ensure this only needs to happen in a rare slow
> path, when it looks like the invariant might be violated but we need
> to make sure (I'm already following this pattern in a couple of
> places). Realistically, there will be some amount of "try it and see"
> here.

I would like to point out for the record/archives that I now believe
that Andres' pending do-over fix for the "Freeze the dead" bug [1]
will leave things in *much* better shape when it comes to
verification. Andres' patch neatly addresses *all* of the concerns
that I raised on this thread. The high-level idea of relfrozenxid as a
unambiguous cut-off point at which it must be safe to interrogate the
CLOG is restored.

Off hand, I'd say that the only interlock amcheck verification now
needs when verifying heap pages against the CLOG is a VACUUM-style
SHARE UPDATE EXCLUSIVE lock on the heap relation being verified. Every
heap tuple must either be observed to be frozen, or must only have
hint bits that are observably in agreement with CLOG. The only
complicated part is the comment that explains why this is
comprehensive and correct (i.e. does not risk false positives or false
negatives). We end up with something that is a bit like a "correct by
construction" design.

The fact that Andres also proposes to add a bunch of new defensive
"can't happen" hard elog()s (mostly by promoting assertions) should
validate the design of tuple + multixact freezing, in the same way
that I hope amcheck can.

[1] https://postgr.es/m/20171114030341.movhteyakqeqx5pm@alap3.anarazel.de
-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Next
From: Robert Haas
Date:
Subject: Re: Top-N sorts verses parallelism