Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Robert Haas
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CA+TgmoZ7NB_AoU3h0KwpcaWcAx0JpgSx4LGf_HOGo9M6Ko1McA@mail.gmail.com
Whole thread Raw
In response to HOT chain validation in verify_heapam()  (Himanshu Upadhyaya <upadhyaya.himanshu@gmail.com>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers
Hi,

Thanks for working on this.

+                htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem);
+                if (!(HeapTupleHeaderIsHeapOnly(htup) &&
htup->t_infomask & HEAP_UPDATED))
+                    report_corruption(&ctx,
+                                      psprintf("Redirected Tuple at
line pointer offset %u is not HEAP_ONLY_TUPLE or HEAP_UPDATED tuple",
+                                               (unsigned) rdoffnum));

This isn't safe because the line pointer referenced by rditem may not
have been sanity-checked yet. Refer to the comment just below where it
says "Sanity-check the line pointer's offset and length values".

There are multiple problems with this error message. First, if you
take a look at the existing messages - which is always a good thing to
do when adding new ones - you will see that they are capitalized
differently. Try to match the existing style. Second, we made a real
effort with the existing messages to avoid referring to the names of
identifiers that only exist at the C level. For example, just above
you will see a message which says "line pointer redirection to item at
offset %u precedes minimum offset %u". It deliberately does not say
"line pointer redirection to item at offset %u is less than
FirstOffsetNumber" even though that would be an equally correct
statement of the problem. The intent here is to make the messages at
least somewhat accessible to users who are somewhat familiar with how
PostgreSQL storage works but may not read the C code. These comments
apply to every message you add in the patch.

The message also does not match the code. The code tests for
HEAP_UPDATED, but the message claims that the code is testing for
either HEAP_ONLY_TUPLE or HEAP_UPDATED. As a general rule, it is best
not to club related tests together in cases like this, because it
enables better and more specific error messages.

It would be clearer to make an explicit comparison to 0, like
(htup->t_infomask & HEAP_UPDATED) != 0, rather than relying on 0 being
false and non-zero being true. It doesn't matter to the compiler, but
it may help human readers.

+            /*
+             * Add line pointer offset to predecessor array.
+             * 1) If xmax is matching with xmin of next
Tuple(reaching via its t_ctid).
+             * 2) If next tuple is in the same page.
+             * Raise corruption if:
+             * We have two tuples having same predecessor.
+             *
+             * We add offset to predecessor irrespective of
transaction(t_xmin) status. We will
+             * do validation related to transaction status(and also
all other validations)
+             * when we loop over predecessor array.
+             */

The formatting of this comment will, I think, be mangled if pgindent
is run against the file. You can use ----- markers to prevent that, I
believe, or (better) write this as a paragraph without relying on the
lines ending up uneven in length.

+                if (predecessor[nextTupOffnum] != 0)
+                {
+                    report_corruption(&ctx,
+                                psprintf("Tuple at offset %u is
reachable from two or more updated tuple",
+                                    (unsigned) nextTupOffnum));
+                    continue;
+                }

You need to do this test after xmin/xmax matching. Otherwise you might
get false positives. Also, the message should be more specific and
match the style of the existing messages. ctx.offnum is already going
to be reported in another column, but we can report both nextTupOffnum
and predecessor[nextTupOffnum] here e.g. "updated version at offset %u
is also the updated version of tuple at offset %u".

+                currTupXmax = HeapTupleHeaderGetUpdateXid(ctx.tuphdr);
+                lp   = PageGetItemId(ctx.page, nextTupOffnum);
+
+                htup = (HeapTupleHeader) PageGetItem(ctx.page, lp);

This has the same problem I mentioned in my first comment above,
namely, we haven't necessarily sanity-checked the length and offset
values for nextTupOffnum yet. Saying that another way, if the contents
of lp are corrupt and point off the page, we want that to be reported
as corruption (which the current code will already do) and we want
this check to be skipped so that we don't crash or access random
memory addresses. You need to think about how to rearrange the code so
that we only perform checks that are known to be safe.

+        /* Now loop over offset and validate data in predecessor array.*/
+        for ( ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+             ctx.offnum = OffsetNumberNext(ctx.offnum))

Please take the time to format your code according to the PostgeSQL
standard practice. If you don't know what that looks like, use
pgindent.

+        {
+            HeapTupleHeader pred_htup, curr_htup;
+            TransactionId   pred_xmin, curr_xmin, curr_xmax;
+            ItemId      pred_lp, curr_lp;

Same here.

Within this loop, you need to think about what to include in the
columns of the output other than 'msg' and what to include in the
message itself. There's no reason to include ctx.offnum in the message
text because it's already included in the 'offnum' column of the
output.

I think it would actually be a good idea to set ctx.offnum to the
predecessor's offset number, and use a separate variable for the
current offset number. The reason why I think this is that I believe
it will make it easier to phrase the messages appropriately. For
example, if ctx.offnum is the predecessor tuple, then we can issue
complaints like this:

tuple with uncommitted xmin %u was updated to produce a tuple at
offset %u with differing xmin %u
unfrozen tuple was updated to produce a tuple at offset %u which is not frozen
tuple with uncommitted xmin %u has cmin %u, but was updated to produce
a tuple with cmin %u
non-heap-only update produced a heap-only tuple at offset %u
heap-only update produced a non-heap only tuple at offset %u

+            if (!TransactionIdIsValid(curr_xmax) &&
+                HeapTupleHeaderIsHotUpdated(curr_htup))
+            {
+                report_corruption(&ctx,
+                            psprintf("Current tuple at offset %u is
HOT but is last tuple in the HOT chain.",
+                            (unsigned) ctx.offnum));
+            }

This check has nothing to do with the predecessor[] array, so it seems
like it belongs in check_tuple() rather than here. Also, the message
is rather confused, because the test is checking whether the tuple has
been HOT-updated, while the message is talking about whether the tuple
was *itself* created by a HOT update. Also, when we're dealing with
corruption, statements like "is last tuple in the HOT chain" are
pretty ambiguous. Also, isn't this an issue for both HOT-updated
tuples and also just regular updated tuples? i.e. maybe what we should
be complaining about here is something like "tuple has been updated,
but xmax is 0" and then make the test check exactly that.

+            if (!HeapTupleHeaderIsHotUpdated(pred_htup) &&
+                HeapTupleHeaderIsHeapOnly(pred_htup) &&
+                HeapTupleHeaderIsHeapOnly(curr_htup))
+            {
+                report_corruption(&ctx,
+                            psprintf("Current tuple at offset %u is
HOT but it is next updated tuple of last Tuple in HOT chain.",
+                            (unsigned) ctx.offnum));
+            }

Three if-statements up, you tested two out of these three conditions
and complained if they were met. So any time this fires, that will
have also fired.

...Robert



pgsql-hackers by date:

Previous
From: Nikita Glukhov
Date:
Subject: Re: SQL/JSON features for v15
Next
From: Andres Freund
Date:
Subject: Re: Strip -mmacosx-version-min options from plperl build