Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Robert Haas
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CA+TgmoYuOHg8ijjYZk7VtvxyYr0Sp0ZeJRguoDOsFhhWnxuHdA@mail.gmail.com
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers
On Thu, Jan 19, 2023 at 8:55 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
> I noticed that this patch stuck a little and decided to take another look.
>
> It seems to be well written, covered with tests and my understanding
> is that all the previous feedback was accounted for. To your knowledge
> is there anything that prevents us from moving it to "Ready for
> Committer"?

Thanks for taking a look, and for pinging the thread.

I think that the handling of lp_valid[] in the loop that begins with
"Loop over offset and populate predecessor array from all entries that
are present in successor array" is very confusing. I think that
lp_valid[] should be answering the question "is the line pointer
basically sane?". That is, if it's a redirect, it needs to point to
something within the line pointer array (and we also check that it
must be an entry in the line pointer array that is used, which seems
fine). If it's not a redirect, it needs to point to space that's
entirely within the block, properly aligned, and big enough to contain
a tuple. We determine the answers to all of these questions in the
first loop, the one that starts with /* Perform tuple checks */.

Nothing that happens in the second loop, where we populate the
predecessor array, can reverse our previous conclusion that the line
pointer is valid, so this loop shouldn't be resetting entries in
lp_valid[] to false. The reason that it's doing so seems to be that it
wants to use lp_valid[] to control the behavior of the third loop,
where we perform checks against things that have entries in the
predecessor array. As written, the code ensures that we always set
lp_valid[nextoffnum] to false unless we set predecessor[nextoffnum] to
a value other than InvalidOffsetNumber. But that is needlessly
complex: the third loop doesn't need to look at lp_valid[] at all. It
can just check whether predecessor[currentoffnum] is valid. If it is,
perform checks. Otherwise, skip it. It seems to me that this would be
significantly simpler.

To put the above complaint another way, a variable shouldn't mean two
different things depending on where you are in the function. Right
now, at the end of the first loop, lp_valid[x] answers the question
"is line pointer x basically valid?". But by the end of the second
loop, it answers the question "is line pointer x valid and does it
also have a valid predecessor?". That kind of definitional change is
something to be avoided.

The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin))
seems wrong to me. Shouldn't it be &&? Has this code been tested at
all?  It doesn't seem to have a test case. Some of these other errors
don't, either. Maybe there's some that we can't easily test in an
automated way, but we should test what we can. I guess maybe casual
testing wouldn't reveal the problem here because of the recheck, but
it's worrying to find logic that doesn't look right with no
corresponding comments or test cases.

Some error message kibitizing:

 psprintf("redirected tuple at line pointer offset %u is not heap only tuple",

It seems to me that this should say "redirected line pointer pointing
to a non-heap-only tuple at offset %u". There is no such thing as a
redirected tuple -- and even if there were, what we have here is
clearly a redirected line pointer.

psprintf("redirected tuple at line pointer offset %u is not heap only tuple",

And I think for the same reasons this one should say something like
"redirected line pointer pointing to a non-heap-only tuple at offset
%u".

 psprintf("redirected tuple at line pointer offset %u is not heap
updated tuple",

Possibly all of these would sound better with "points" rather than
"pointing" -- if so, we'd need to change an existing message in the
same way.

And this one should say something like "redirected line pointer
pointing to a tuple not produced by an update at offset %u".

 psprintf("tuple is root of chain but it is marked as heap-only tuple"));

I think this would sound better if you deleted the word "it".

I don't know whether it's worth arguing about -- it feels like we've
argued too much already about this sort of thing -- but I am not very
convinced by initializers like OffsetNumber
predecessor[MaxOffsetNumber] = {InvalidOffsetNumber}. That style is
only correct because InvalidOffsetNumber happens to be zero. If it
were up to me, I'd use memset to clear the predecessor array. I would
not bulk initialize sucessor and lp_valid but make sure that the first
loop always sets them, possibly by having the top of the loop set them
to InvalidOffsetNumber and false initially and then letting code later
in the loop change the value, or possibly in some other way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: add PROCESS_MAIN to VACUUM
Next
From: Robert Haas
Date:
Subject: Re: Non-superuser subscription owners