Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers
From | Himanshu Upadhyaya |
---|---|
Subject | Re: HOT chain validation in verify_heapam() |
Date | |
Msg-id | CAPF61jCNEZCmM51YRL+N2MiYSnE+Sj11DPjELkeO57awU8PNaw@mail.gmail.com Whole thread Raw |
In response to | Re: HOT chain validation in verify_heapam() (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: HOT chain validation in verify_heapam()
Re: HOT chain validation in verify_heapam() |
List | pgsql-hackers |
On Fri, Jan 20, 2023 at 12:38 AM Robert Haas <robertmhaas@gmail.com> wrote:
I think that the handling of lp_valid[] in the loop that begins with
"Loop over offset and populate predecessor array from all entries that
are present in successor array" is very confusing. I think that
lp_valid[] should be answering the question "is the line pointer
basically sane?". That is, if it's a redirect, it needs to point to
something within the line pointer array (and we also check that it
must be an entry in the line pointer array that is used, which seems
fine). If it's not a redirect, it needs to point to space that's
entirely within the block, properly aligned, and big enough to contain
a tuple. We determine the answers to all of these questions in the
first loop, the one that starts with /* Perform tuple checks */.
Nothing that happens in the second loop, where we populate the
predecessor array, can reverse our previous conclusion that the line
pointer is valid, so this loop shouldn't be resetting entries in
lp_valid[] to false. The reason that it's doing so seems to be that it
wants to use lp_valid[] to control the behavior of the third loop,
where we perform checks against things that have entries in the
predecessor array. As written, the code ensures that we always set
lp_valid[nextoffnum] to false unless we set predecessor[nextoffnum] to
a value other than InvalidOffsetNumber. But that is needlessly
complex: the third loop doesn't need to look at lp_valid[] at all. It
can just check whether predecessor[currentoffnum] is valid. If it is,
perform checks. Otherwise, skip it. It seems to me that this would be
significantly simpler.
I was trying to use lp_valid as I need to identify the root of the HOT chain and we are doing validation on the root of the HOT chain when we loop over the predecessor array.
if (nextoffnum == InvalidOffsetNumber || !lp_valid[ctx.offnum] || !lp_valid[nextoffnum])
{
/*
* Set lp_valid of nextoffnum to false if current tuple's
* lp_valid is true. We don't add this to predecessor array as
* it's of no use to validate tuple if its predecessor is
* already corrupted but we need to identify all those tuple's
* so that we can differentiate between all the cases of
* missing offset in predecessor array, this will help in
* validating the root of chain when we loop over predecessor
* array.
*/
{
/*
* Set lp_valid of nextoffnum to false if current tuple's
* lp_valid is true. We don't add this to predecessor array as
* it's of no use to validate tuple if its predecessor is
* already corrupted but we need to identify all those tuple's
* so that we can differentiate between all the cases of
* missing offset in predecessor array, this will help in
* validating the root of chain when we loop over predecessor
* array.
*/
if (!lp_valid[ctx.offnum] && lp_valid[nextoffnum])
lp_valid[nextoffnum] = false;
lp_valid[nextoffnum] = false;
Was resetting lp_valid in the last patch because we don't add data to predecessor[] and while looping over the predecessor array we need to isolate (and identify) all cases of missing data in the predecessor array to exactly identify the root of HOT chain.
One solution is to always add data to predecessor array while looping over successor array and then while looping over predecessor array we can continue for other validation "if (lp_valid [predecessor[currentoffnum]] && lp_valid[currentoffnum]" is true but in this case also our third loop will also look at lp_valid[].
To put the above complaint another way, a variable shouldn't mean two
different things depending on where you are in the function. Right
now, at the end of the first loop, lp_valid[x] answers the question
"is line pointer x basically valid?". But by the end of the second
loop, it answers the question "is line pointer x valid and does it
also have a valid predecessor?". That kind of definitional change is
something to be avoided.
agree.
The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin))
seems wrong to me. Shouldn't it be &&? Has this code been tested at
all? It doesn't seem to have a test case. Some of these other errors
don't, either. Maybe there's some that we can't easily test in an
automated way, but we should test what we can. I guess maybe casual
testing wouldn't reveal the problem here because of the recheck, but
it's worrying to find logic that doesn't look right with no
corresponding comments or test cases.
This is totally my Mistake, apologies for that. I will fix this in my next patch. Regarding the missing test cases, I need one in-progress transaction for these test cases to be included in 004_verify_heapam.pl but I don't find a clear way to have an in-progress transaction(as per the design of 004_verify_heapam.pl ) that I can use in the test cases. I will be doing more research on a solution to add these missing test cases.
Some error message kibitizing:
psprintf("redirected tuple at line pointer offset %u is not heap only tuple",
It seems to me that this should say "redirected line pointer pointing
to a non-heap-only tuple at offset %u". There is no such thing as a
redirected tuple -- and even if there were, what we have here is
clearly a redirected line pointer.
psprintf("redirected tuple at line pointer offset %u is not heap only tuple",
And I think for the same reasons this one should say something like
"redirected line pointer pointing to a non-heap-only tuple at offset
%u".
psprintf("redirected tuple at line pointer offset %u is not heap
updated tuple",
Possibly all of these would sound better with "points" rather than
"pointing" -- if so, we'd need to change an existing message in the
same way.
And this one should say something like "redirected line pointer
pointing to a tuple not produced by an update at offset %u".
psprintf("tuple is root of chain but it is marked as heap-only tuple"));
I think this would sound better if you deleted the word "it".
Will change accordingly in my next patch.
I don't know whether it's worth arguing about -- it feels like we've
argued too much already about this sort of thing -- but I am not very
convinced by initializers like OffsetNumber
predecessor[MaxOffsetNumber] = {InvalidOffsetNumber}. That style is
only correct because InvalidOffsetNumber happens to be zero. If it
were up to me, I'd use memset to clear the predecessor array. I would
not bulk initialize sucessor and lp_valid but make sure that the first
loop always sets them, possibly by having the top of the loop set them
to InvalidOffsetNumber and false initially and then letting code later
in the loop change the value, or possibly in some other way.
agree, will fix in my next patch
pgsql-hackers by date: