Re: new heapcheck contrib module - Mailing list pgsql-hackers

From Robert Haas
Subject Re: new heapcheck contrib module
Date
Msg-id CA+TgmoZT3kCOaKq6=m5wHGKh_9XQDbCC1LZXL6w_+NMB-n1+mw@mail.gmail.com
Whole thread Raw
In response to Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
Re: new heapcheck contrib module  (Amul Sul <sulamul@gmail.com>)
List pgsql-hackers
On Mon, Jul 27, 2020 at 1:02 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Not at all!  I appreciate all the reviews.

Reviewing 0002, reading through verify_heapam.c:

+typedef enum SkipPages
+{
+ SKIP_ALL_FROZEN_PAGES,
+ SKIP_ALL_VISIBLE_PAGES,
+ SKIP_PAGES_NONE
+} SkipPages;

This looks inconsistent. Maybe just start them all with SKIP_PAGES_.

+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));

This doesn't look much like other error messages in the code. Do
something like git grep -A4 PG_ARGISNULL | grep -A3 ereport and study
the comparables.

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all-visible', 'all-frozen', or 'none'")));

Same problem. Check pg_prewarm's handling of the prewarm type, or
EXPLAIN's handling of the FORMAT option, or similar examples. Read the
message style guidelines concerning punctuation of hint and detail
messages.

+ * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572)
+ * to have sometimes rendered the oldest xid value for a database invalid.
+ * It seems unwise to report rows as corrupt for failing to be newer than
+ * a value which itself may be corrupt.  We instead use the oldest xid for
+ * the entire cluster, which must be at least as old as the oldest xid for
+ * our database.

This kind of reference to another comment will not age well; line
numbers and files change a lot. But I think the right thing to do here
is just rely on relfrozenxid and relminmxid. If the table is
inconsistent with those, then something needs fixing. datfrozenxid and
the cluster-wide value can look out for themselves. The corruption
detector shouldn't be trying to work around any bugs in setting
relfrozenxid itself; such problems are arguably precisely what we're
here to find.

+/*
+ * confess
+ *
+ *   Return a message about corruption, including information
+ *   about where in the relation the corruption was found.
+ *
+ *   The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext *ctx, char *msg)

Contrary to what the comments say, the function doesn't return a
message about corruption or anything else. It returns void.

I don't really like the name, either. I get that it's probably
inspired by Perl, but I think it should be given a less-clever name
like report_corruption() or something.

+ * corrupted table from using workmem worth of memory building up the

This kind of thing destroys grep-ability. If you're going to refer to
work_mem, you gotta spell it the same way we do everywhere else.

+ * Helper function to construct the TupleDesc needed by verify_heapam.

Instead of saying it's the TupleDesc somebody needs, how about saying
that it's the TupleDesc that we'll use to report problems that we find
while scanning the heap, or something like that?

+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past.  Stores the inferred FullTransactionId in *fxid.

It really doesn't, because there's no such thing as 'fxid' referenced
anywhere here. You should really make the effort to proofread your
patches before posting, and adjust comments and so on as you go.
Otherwise reviewing takes longer, and if you keep introducing new
stuff like this as you fix other stuff, you can fail to ever produce a
committable patch.

+ * Determine whether tuples are visible for verification.  Similar to
+ *  HeapTupleSatisfiesVacuum, but with critical differences.

Not accurate, because it also reports problems, which is not mentioned
anywhere in the function header comment that purports to be a detailed
description of what the function does.

+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return true; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }

One of these cases is not punctuated like the others.

+ pstrdup("heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor
has a valid xmax"));

1. I don't think that's very grammatical.

2. Why abbreviate HEAP_XMAX_IS_MULTI to XMAX_IS_MULTI and
HEAP_XMAX_IS_LOCKED_ONLY to LOCKED_ONLY? I don't even think you should
be referencing C constant names here at all, and if you are I don't
think you should abbreviate, and if you do abbreviate I don't think
you should omit different numbers of words depending on which constant
it is.

I wonder what the intended division of responsibility is here,
exactly. It seems like you've ended up with some sanity checks in
check_tuple() before tuple_is_visible() is called, and others in
tuple_is_visible() proper. As far as I can see the comments don't
really discuss the logic behind the split, but there's clearly a close
relationship between the two sets of checks, even to the point where
you have "heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor has
a valid xmax" in tuple_is_visible() and "tuple xmax marked
incompatibly as keys updated and locked only" in check_tuple(). Now,
those are not the same check, but they seem like closely related
things, so it's not ideal that they happen in different functions with
differently-formatted messages to report problems and no explanation
of why it's different.

I think it might make sense here to see whether you could either move
more stuff out of tuple_is_visible(), so that it really just checks
whether the tuple is visible, or move more stuff into it, so that it
has the job not only of checking whether we should continue with
checks on the tuple contents but also complaining about any other
visibility problems. Or if neither of those make sense then there
should be a stronger attempt to rationalize in the comments what
checks are going where and for what reason, and also a stronger
attempt to rationalize the message wording.

+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));

Should we be worrying about the possibility of fastgetattr crapping
out if the TOAST tuple is corrupted?

+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx,
+ psprintf("tuple attribute should start at offset %u, but tuple
length is only %u",
+ ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }

This looks like it's not going to complain about a fixed-length
attribute that overruns the tuple length. There's code further down
that handles that case for a varlena attribute, but there's nothing
comparable for the fixed-length case.

+ confess(ctx,
+ psprintf("%s toast at offset %u is unexpected",
+ va_tag == VARTAG_INDIRECT ? "indirect" :
+ va_tag == VARTAG_EXPANDED_RO ? "expanded" :
+ va_tag == VARTAG_EXPANDED_RW ? "expanded" :
+ "unexpected",
+ ctx->tuphdr->t_hoff + ctx->offset));

I suggest "unexpected TOAST tag %d", without trying to convert to a
string. Such a conversion will likely fail in the case of genuine
corruption, and isn't meaningful even if it works.

Again, let's try to standardize terminology here: most of the messages
in this function are now of the form "tuple attribute %d has some
problem" or "attribute %d has some problem", but some have neither.
Since we're separately returning attnum I don't see why it should be
in the message, and if we weren't separately returning attnum then it
ought to be in the message the same way all the time, rather than
sometimes writing "attribute" and other times "tuple attribute".

+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx,
+ psprintf("tuple xmax %u precedes relminmxid %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }

There are checks that an XID is neither too old nor too new, and
presumably something similar could be done for MultiXactIds, but here
you only check one end of the range. Seems like you should check both.

+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx,
+ psprintf("tuple xmin %u precedes relfrozenxid %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmin, ctx))
+ {
+ confess(ctx,
+ psprintf("tuple xmin %u follows last assigned xid %u",
+ xmin, ctx->next_valid_xid));
+ fatal = true;
+ }
+ }

Here you do check both ends of the range, but the comment claims
otherwise. Again, please proof-read for this kind of stuff.

+ /* Check xmax against relfrozenxid */

Ditto here.

+ psprintf("tuple's header size is %u bytes which is less than the %u
byte minimum valid header size",

I suggest: tuple data begins at byte %u, but the tuple header must be
at least %u bytes

+ psprintf("tuple's %u byte header size exceeds the %u byte length of
the entire tuple",

I suggest: tuple data begins at byte %u, but the entire tuple length
is only %u bytes

+ psprintf("tuple's user data offset %u not maximally aligned to %u",

I suggest: tuple data begins at byte %u, but that is not maximally aligned
Or: tuple data begins at byte %u, which is not a multiple of %u

That makes the messages look much more similar to each other
grammatically and is more consistent about calling things by the same
names.

+ psprintf("tuple with null values has user data offset %u rather than
the expected offset %u",
+ psprintf("tuple without null values has user data offset %u rather
than the expected offset %u",

I suggest merging these: tuple data offset %u, but expected offset %u
(%u attributes, %s)
where %s is either "has nulls" or "no nulls"

In fact, aren't several of the above checks redundant with this one?
Like, why check for a value less than SizeofHeapTupleHeader or that's
not properly aligned first? Just check this straightaway and call it
good.

+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc.  The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.

This is yet another case where you didn't update the comments.
tuple_is_visible() now checks whether the tuple is visible to anyone,
not whether it's visible to us, but the comment doesn't agree. In some
sense I think this comment is redundant with the previous one anyway,
because that one already talks about the tuple being visible. Maybe
just write: The tuple is visible, so it must be compatible with the
current version of the relation descriptor. It might have fewer
columns than are present in the relation descriptor, but it cannot
have more.

+ psprintf("tuple has %u attributes in relation with only %u attributes",
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));

I suggest: tuple has %u attributes, but relation has only %u attributes

+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }

I think this comment is too wordy. This text belongs in the header
comment of check_tuple_attribute(), not at the place where it gets
called. Otherwise, as you update what check_tuple_attribute() does,
you have to remember to come find this comment and fix it to match,
and you might forget to do that. In fact... looks like that already
happened, because check_tuple_attribute() now checks more than broken
TOAST attributes. Seems like you could just simplify this down to
something like "Now check each attribute." Also, you could lose the
extra braces.

- bt_index_check |             relname             | relpages
+ bt_index_check |             relname             | relpages

Don't include unrelated changes in the patch.

I'm not really sure that the list of fields you're displaying for each
reported problem really makes sense. I think the theory here should be
that we want to report the information that the user needs to localize
the problem but not everything that they could find out from
inspecting the page, and not things that are too specific to
particular classes of errors. So I would vote for keeping blkno,
offnum, and attnum, but I would lose lp_flags, lp_len, and chunk.
lp_off feels like it's a more arguable case: technically, it's a
locator for the problem, because it gives you the byte offset within
the page, but normally we reference tuples by TID, i.e. (blkno,
offset), not byte offset. On balance I'd be inclined to omit it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: PG 13 release notes, first draft
Next
From: Jeff Davis
Date:
Subject: Re: HyperLogLog.c and pg_leftmost_one_pos32()