Re: new heapcheck contrib module - Mailing list pgsql-hackers
From | Amul Sul |
---|---|
Subject | Re: new heapcheck contrib module |
Date | |
Msg-id | CAAJ_b94hsrF4x7UKQj03416fAEXJorUpZ9WhsZZjqgv-4+Wg9A@mail.gmail.com Whole thread Raw |
In response to | Re: new heapcheck contrib module (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: new heapcheck contrib module
|
List | pgsql-hackers |
On Thu, Jul 30, 2020 at 11:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jul 27, 2020 at 1:02 PM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: > > Not at all! I appreciate all the reviews. > > Reviewing 0002, reading through verify_heapam.c: > > +typedef enum SkipPages > +{ > + SKIP_ALL_FROZEN_PAGES, > + SKIP_ALL_VISIBLE_PAGES, > + SKIP_PAGES_NONE > +} SkipPages; > > This looks inconsistent. Maybe just start them all with SKIP_PAGES_. > > + if (PG_ARGISNULL(0)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("missing required parameter for 'rel'"))); > > This doesn't look much like other error messages in the code. Do > something like git grep -A4 PG_ARGISNULL | grep -A3 ereport and study > the comparables. > > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("unrecognized parameter for 'skip': %s", skip), > + errhint("please choose from 'all-visible', 'all-frozen', or 'none'"))); > > Same problem. Check pg_prewarm's handling of the prewarm type, or > EXPLAIN's handling of the FORMAT option, or similar examples. Read the > message style guidelines concerning punctuation of hint and detail > messages. > > + * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572) > + * to have sometimes rendered the oldest xid value for a database invalid. > + * It seems unwise to report rows as corrupt for failing to be newer than > + * a value which itself may be corrupt. We instead use the oldest xid for > + * the entire cluster, which must be at least as old as the oldest xid for > + * our database. > > This kind of reference to another comment will not age well; line > numbers and files change a lot. But I think the right thing to do here > is just rely on relfrozenxid and relminmxid. If the table is > inconsistent with those, then something needs fixing. datfrozenxid and > the cluster-wide value can look out for themselves. The corruption > detector shouldn't be trying to work around any bugs in setting > relfrozenxid itself; such problems are arguably precisely what we're > here to find. > > +/* > + * confess > + * > + * Return a message about corruption, including information > + * about where in the relation the corruption was found. > + * > + * The msg argument is pfree'd by this function. > + */ > +static void > +confess(HeapCheckContext *ctx, char *msg) > > Contrary to what the comments say, the function doesn't return a > message about corruption or anything else. It returns void. > > I don't really like the name, either. I get that it's probably > inspired by Perl, but I think it should be given a less-clever name > like report_corruption() or something. > > + * corrupted table from using workmem worth of memory building up the > > This kind of thing destroys grep-ability. If you're going to refer to > work_mem, you gotta spell it the same way we do everywhere else. > > + * Helper function to construct the TupleDesc needed by verify_heapam. > > Instead of saying it's the TupleDesc somebody needs, how about saying > that it's the TupleDesc that we'll use to report problems that we find > while scanning the heap, or something like that? > > + * Given a TransactionId, attempt to interpret it as a valid > + * FullTransactionId, neither in the future nor overlong in > + * the past. Stores the inferred FullTransactionId in *fxid. > > It really doesn't, because there's no such thing as 'fxid' referenced > anywhere here. You should really make the effort to proofread your > patches before posting, and adjust comments and so on as you go. > Otherwise reviewing takes longer, and if you keep introducing new > stuff like this as you fix other stuff, you can fail to ever produce a > committable patch. > > + * Determine whether tuples are visible for verification. Similar to > + * HeapTupleSatisfiesVacuum, but with critical differences. > > Not accurate, because it also reports problems, which is not mentioned > anywhere in the function header comment that purports to be a detailed > description of what the function does. > > + else if (TransactionIdIsCurrentTransactionId(raw_xmin)) > + return true; /* insert or delete in progress */ > + else if (TransactionIdIsInProgress(raw_xmin)) > + return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */ > + else if (!TransactionIdDidCommit(raw_xmin)) > + { > + return false; /* HEAPTUPLE_DEAD */ > + } > > One of these cases is not punctuated like the others. > > + pstrdup("heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor > has a valid xmax")); > > 1. I don't think that's very grammatical. > > 2. Why abbreviate HEAP_XMAX_IS_MULTI to XMAX_IS_MULTI and > HEAP_XMAX_IS_LOCKED_ONLY to LOCKED_ONLY? I don't even think you should > be referencing C constant names here at all, and if you are I don't > think you should abbreviate, and if you do abbreviate I don't think > you should omit different numbers of words depending on which constant > it is. > > I wonder what the intended division of responsibility is here, > exactly. It seems like you've ended up with some sanity checks in > check_tuple() before tuple_is_visible() is called, and others in > tuple_is_visible() proper. As far as I can see the comments don't > really discuss the logic behind the split, but there's clearly a close > relationship between the two sets of checks, even to the point where > you have "heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor has > a valid xmax" in tuple_is_visible() and "tuple xmax marked > incompatibly as keys updated and locked only" in check_tuple(). Now, > those are not the same check, but they seem like closely related > things, so it's not ideal that they happen in different functions with > differently-formatted messages to report problems and no explanation > of why it's different. > > I think it might make sense here to see whether you could either move > more stuff out of tuple_is_visible(), so that it really just checks > whether the tuple is visible, or move more stuff into it, so that it > has the job not only of checking whether we should continue with > checks on the tuple contents but also complaining about any other > visibility problems. Or if neither of those make sense then there > should be a stronger attempt to rationalize in the comments what > checks are going where and for what reason, and also a stronger > attempt to rationalize the message wording. > > + curchunk = DatumGetInt32(fastgetattr(toasttup, 2, > + ctx->toast_rel->rd_att, &isnull)); > > Should we be worrying about the possibility of fastgetattr crapping > out if the TOAST tuple is corrupted? > > + if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len) > + { > + confess(ctx, > + psprintf("tuple attribute should start at offset %u, but tuple > length is only %u", > + ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len)); > + return false; > + } > + > + /* Skip null values */ > + if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits)) > + return true; > + > + /* Skip non-varlena values, but update offset first */ > + if (thisatt->attlen != -1) > + { > + ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign); > + ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen, > + tp + ctx->offset); > + return true; > + } > > This looks like it's not going to complain about a fixed-length > attribute that overruns the tuple length. There's code further down > that handles that case for a varlena attribute, but there's nothing > comparable for the fixed-length case. > > + confess(ctx, > + psprintf("%s toast at offset %u is unexpected", > + va_tag == VARTAG_INDIRECT ? "indirect" : > + va_tag == VARTAG_EXPANDED_RO ? "expanded" : > + va_tag == VARTAG_EXPANDED_RW ? "expanded" : > + "unexpected", > + ctx->tuphdr->t_hoff + ctx->offset)); > > I suggest "unexpected TOAST tag %d", without trying to convert to a > string. Such a conversion will likely fail in the case of genuine > corruption, and isn't meaningful even if it works. > > Again, let's try to standardize terminology here: most of the messages > in this function are now of the form "tuple attribute %d has some > problem" or "attribute %d has some problem", but some have neither. > Since we're separately returning attnum I don't see why it should be > in the message, and if we weren't separately returning attnum then it > ought to be in the message the same way all the time, rather than > sometimes writing "attribute" and other times "tuple attribute". > > + /* Check relminmxid against mxid, if any */ > + xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr); > + if (infomask & HEAP_XMAX_IS_MULTI && > + MultiXactIdPrecedes(xmax, ctx->relminmxid)) > + { > + confess(ctx, > + psprintf("tuple xmax %u precedes relminmxid %u", > + xmax, ctx->relminmxid)); > + fatal = true; > + } > > There are checks that an XID is neither too old nor too new, and > presumably something similar could be done for MultiXactIds, but here > you only check one end of the range. Seems like you should check both. > > + /* Check xmin against relfrozenxid */ > + xmin = HeapTupleHeaderGetXmin(ctx->tuphdr); > + if (TransactionIdIsNormal(ctx->relfrozenxid) && > + TransactionIdIsNormal(xmin)) > + { > + if (TransactionIdPrecedes(xmin, ctx->relfrozenxid)) > + { > + confess(ctx, > + psprintf("tuple xmin %u precedes relfrozenxid %u", > + xmin, ctx->relfrozenxid)); > + fatal = true; > + } > + else if (!xid_valid_in_rel(xmin, ctx)) > + { > + confess(ctx, > + psprintf("tuple xmin %u follows last assigned xid %u", > + xmin, ctx->next_valid_xid)); > + fatal = true; > + } > + } > > Here you do check both ends of the range, but the comment claims > otherwise. Again, please proof-read for this kind of stuff. > > + /* Check xmax against relfrozenxid */ > > Ditto here. > > + psprintf("tuple's header size is %u bytes which is less than the %u > byte minimum valid header size", > > I suggest: tuple data begins at byte %u, but the tuple header must be > at least %u bytes > > + psprintf("tuple's %u byte header size exceeds the %u byte length of > the entire tuple", > > I suggest: tuple data begins at byte %u, but the entire tuple length > is only %u bytes > > + psprintf("tuple's user data offset %u not maximally aligned to %u", > > I suggest: tuple data begins at byte %u, but that is not maximally aligned > Or: tuple data begins at byte %u, which is not a multiple of %u > > That makes the messages look much more similar to each other > grammatically and is more consistent about calling things by the same > names. > > + psprintf("tuple with null values has user data offset %u rather than > the expected offset %u", > + psprintf("tuple without null values has user data offset %u rather > than the expected offset %u", > > I suggest merging these: tuple data offset %u, but expected offset %u > (%u attributes, %s) > where %s is either "has nulls" or "no nulls" > > In fact, aren't several of the above checks redundant with this one? > Like, why check for a value less than SizeofHeapTupleHeader or that's > not properly aligned first? Just check this straightaway and call it > good. > > + * If we get this far, the tuple is visible to us, so it must not be > + * incompatible with our relDesc. The natts field could be legitimately > + * shorter than rel's natts, but it cannot be longer than rel's natts. > > This is yet another case where you didn't update the comments. > tuple_is_visible() now checks whether the tuple is visible to anyone, > not whether it's visible to us, but the comment doesn't agree. In some > sense I think this comment is redundant with the previous one anyway, > because that one already talks about the tuple being visible. Maybe > just write: The tuple is visible, so it must be compatible with the > current version of the relation descriptor. It might have fewer > columns than are present in the relation descriptor, but it cannot > have more. > > + psprintf("tuple has %u attributes in relation with only %u attributes", > + ctx->natts, > + RelationGetDescr(ctx->rel)->natts)); > > I suggest: tuple has %u attributes, but relation has only %u attributes > > + /* > + * Iterate over the attributes looking for broken toast values. This > + * roughly follows the logic of heap_deform_tuple, except that it doesn't > + * bother building up isnull[] and values[] arrays, since nobody wants > + * them, and it unrolls anything that might trip over an Assert when > + * processing corrupt data. > + */ > + ctx->offset = 0; > + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++) > + { > + if (!check_tuple_attribute(ctx)) > + break; > + } > > I think this comment is too wordy. This text belongs in the header > comment of check_tuple_attribute(), not at the place where it gets > called. Otherwise, as you update what check_tuple_attribute() does, > you have to remember to come find this comment and fix it to match, > and you might forget to do that. In fact... looks like that already > happened, because check_tuple_attribute() now checks more than broken > TOAST attributes. Seems like you could just simplify this down to > something like "Now check each attribute." Also, you could lose the > extra braces. > > - bt_index_check | relname | relpages > + bt_index_check | relname | relpages > > Don't include unrelated changes in the patch. > > I'm not really sure that the list of fields you're displaying for each > reported problem really makes sense. I think the theory here should be > that we want to report the information that the user needs to localize > the problem but not everything that they could find out from > inspecting the page, and not things that are too specific to > particular classes of errors. So I would vote for keeping blkno, > offnum, and attnum, but I would lose lp_flags, lp_len, and chunk. > lp_off feels like it's a more arguable case: technically, it's a > locator for the problem, because it gives you the byte offset within > the page, but normally we reference tuples by TID, i.e. (blkno, > offset), not byte offset. On balance I'd be inclined to omit it. > > -- In addition to this, I found a few more things while reading v13 patch are as below: Patch v13-0001: - +#include "amcheck.h" Not in correct order. +typedef struct BtreeCheckContext +{ + TupleDesc tupdesc; + Tuplestorestate *tupstore; + bool is_corrupt; + bool on_error_stop; +} BtreeCheckContext; Unnecessary spaces/tabs between } and BtreeCheckContext. static void bt_index_check_internal(Oid indrelid, bool parentcheck, - bool heapallindexed, bool rootdescend); + bool heapallindexed, bool rootdescend, + BtreeCheckContext * ctx); Unnecessary space between * and ctx. The same changes needed for other places as well. --- Patch v13-0002: +-- partitioned tables (the parent ones) don't have visibility maps +create table test_partitioned (a int, b text default repeat('x', 5000)) + partition by list (a); +-- these should all fail +select * from verify_heapam('test_partitioned', + on_error_stop := false, + skip := NULL, + startblock := NULL, + endblock := NULL); +ERROR: "test_partitioned" is not a table, materialized view, or TOAST table +create table test_partition partition of test_partitioned for values in (1); +create index test_index on test_partition (a); Can't we make it work? If the input is partitioned, I think we could collect all its leaf partitions and process them one by one. Thoughts? + ctx->chunkno++; Instead of incrementing in check_toast_tuple(), I think incrementing should happen at the caller -- just after check_toast_tuple() call. --- Patch v13-0003: + resetPQExpBuffer(query); + destroyPQExpBuffer(query); resetPQExpBuffer() will be unnecessary if the next call is destroyPQExpBuffer(). + appendPQExpBuffer(query, + "SELECT c.relname, v.blkno, v.offnum, v.lp_off, " + "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg" + "\nFROM verify_heapam(rel := %u, on_error_stop := %s, " + "skip := %s, startblock := %s, endblock := %s) v, " + "pg_class c" + "\nWHERE c.oid = %u", + tbloid, stop, skip, settings.startblock, + settings.endblock, tbloid pg_class should be schema-qualified like elsewhere. IIUC, pg_class is meant to get relname only, instead, we could use '%u'::pg_catalog.regclass in the target list for the relname. Thoughts? Also I think we should skip '\n' from the query string (see appendPQExpBuffer() in pg_dump.c) + appendPQExpBuffer(query, + "SELECT i.indexrelid" + "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c" + "\nWHERE i.indexrelid = c.oid" + "\n AND c.relam = %u" + "\n AND i.indrelid = %u", + BTREE_AM_OID, tbloid); + + ExecuteSqlStatement("RESET search_path"); + res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK); + PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL)); I don't think we need the search_path query. The main query doesn't have any dependencies on it. Same is in check_indexes(), check_index (), expand_table_name_patterns() & get_table_check_list(). Correct me if I am missing something. + output = PageOutput(lines + 2, NULL); + for (lineno = 0; usage_text[lineno]; lineno++) + fprintf(output, "%s\n", usage_text[lineno]); + fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT); + fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL); I am not sure why we want PageOutput() if the second argument is always going to be NULL? Can't we directly use printf() instead of PageOutput() + fprintf() ? e.g. usage() function in pg_basebackup.c. Regards, Amul
pgsql-hackers by date: