Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0.
Date
Msg-id CAH2-Wzn+1cBF+KsJOd-H4WnaNWuBgckSqNPbMeg1Q8eS9zxrXQ@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0.  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Mar 27, 2023 at 1:17 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Patch attached.

This is fine, as far as it goes. Obviously it fixes the immediate problem.

> > I don't think that it's particularly likely that having refined
> > aborted speculative insertion amcheck coverage will make a critical
> > difference to any user, at any time. But "amcheck as documentation of
> > the on-disk format" is reason enough to have it.
>
> Sure, if someone feels like writing the code. I have to admit that I
> have mixed feelings about this whole direction. In concept, I agree
> with you entirely: a fringe benefit of having checks that tell us
> whether or not a page is valid is that it helps to make clear what
> page states we think are valid.

I don't think that it's a fringe benefit; it's just not necessarily of
direct benefit to amcheck users.

Before the HOT chain validation patch went in, it was unclear whether
certain conceivable on-disk states should constitute corruption. In
particular, it wasn't clear to anybody whether or not it was okay for
an LP_REDIRECT to point to an LP_DEAD until recently (and probably
other things besides that). I don't think that we should assume that
the easy part is abstractly defining corruption, while the hard part
is writing the tool to check for the corruption. Sometimes it is, but
I think that it's often the other way around.

> In practice, however, the point you
> raise in your first sentence weighs awfully heavily with me. Spending
> a lot of energy on checks that are unlikely to catch practical
> problems feels like it may not be the best use of time.

That definitely could be true, but I don't think that it's terribly
much extra effort in most cases.

> I'm not sure
> exactly where to draw the line, but it seems highly likely to be that
> there are things we could deduce about the page that wouldn't be worth
> the effort. For example, would we bother checking that a tuple with an
> in-progress xmin does not have a smaller natts value than a tuple with
> a committed xmin? Or that natts values are non-decreasing across a HOT
> chain? I suspect there are even more obscure examples of things that
> should be true but might not really be worth worrying about in the
> code.

A related way of looking at it (that I also find appealing) is that
it's often easier (far easier) to just have the check, and be done
with it. Of course there is bound to be uncertainty about how useful
any given check might be; we're looking for something that is
theoretically never supposed to happen. Why not just assume that it
might matter if it's not costing very much to check for it?

This is quite a different mentality than the one we bring to core
heapam code, where it's quite natural to just avoid strange corner
cases in the on-disk format like the plague. The risk profile is
totally different for amcheck code. Within amcheck, I'd rather go too
far than not go far enough.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: running logical replication as the subscription owner
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: Memory leak from ExecutorState context?