[HACKERS] Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-deadbug (Was: amcheck (B-Tree integrity checking tool)) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject [HACKERS] Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-deadbug (Was: amcheck (B-Tree integrity checking tool))
Date
Msg-id CAH2-Wz=4C2_m=EKZxuJRwh_hTVgLzaaussNNxeh_Oi_QxS9Spw@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Re: heap/SLRU verification, relfrozenxid cut-off, andfreeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool))  (Noah Misch <noah@leadboat.com>)
Responses [HACKERS] Re: heap/SLRU verification, relfrozenxid cut-off, andfreeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool))
List pgsql-hackers
On Fri, Oct 13, 2017 at 7:09 PM, Noah Misch <noah@leadboat.com> wrote:
> All good questions; I don't know offhand.  Discovering those answers is
> perhaps the chief labor required of such a project.

ISTM that by far the hardest part of the project is arriving at a
consensus around what a good set of invariants for CLOG and MultiXact
looks like.

I think that it's fair to say that this business with relfrozenxid now
appears to be more complicated than many of us would have thought. Or,
at least, more complicated than I thought when I first started
thinking about it. Once we're measuring this complexity (by having
checks), we should be in a better position to keep it under control,
and to avoid ambiguity.

> The checker should
> consider circumstances potentially carried from past versions via pg_upgrade.

Right. False positives are simply unacceptable.

> Fortunately, if you get some details wrong, it's cheap to recover from checker
> bugs.

Ideally, amcheck will become a formal statement of the contracts
provided by major subsystems, such as the heapam, the various SLRUs,
and so on. While I agree that having bugs there is much less severe
than having bugs in backend code, I would like the tool to reach a
point where it actually *defines* correctness (by community
consensus). If a bug in amcheck reflects a bug in our high level
thinking about correctness, then that actually is a serious problem.
Arguably, it's the most costly variety of bug that Postgres can have.

I may never be able to get general buy-in here; building broad
consensus like that is a lot harder than writing some code for a
contrib module. Making the checking code the *authoritative* record of
how invariants are *expected* to work is a major goal of the project,
though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: [HACKERS] Determine state of cluster (HA)
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Still another race condition in recovery TAP tests