Re: new heapcheck contrib module - Mailing list pgsql-hackers

From Robert Haas
Subject Re: new heapcheck contrib module
Date
Msg-id CA+Tgmoa7sNcw15=soG1pLBgByMnn-w70YdfKZmhmYOb43KzzRw@mail.gmail.com
Whole thread Raw
In response to Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
Re: new heapcheck contrib module  (Robert Haas <robertmhaas@gmail.com>)
Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
On Mon, Oct 26, 2020 at 12:12 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a rebase.  (0001 was already committed
lastweek.)
 
>
> Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with the previous naming.  There are no
substantialchanges.
 

Here's a review of 0002. I basically like the direction this is going
but I guess nobody will be surprised that there are some things in
here that I think could be improved.

+const char *usage_text[] = {
+       "pg_amcheck is the PostgreSQL command line frontend for the
amcheck database corruption checker.",
+       "",

This looks like a novel approach to the problem of printing out the
usage() information, and I think that it's inferior to the technique
used elsewhere of just having a bunch of printf() statements, because
unless I misunderstand, it doesn't permit localization.

+       "  -b, --startblock             begin checking table(s) at the
given starting block number",
+       "  -e, --endblock               check table(s) only up to the
given ending block number",
+       "  -B, --toast-startblock       begin checking toast table(s)
at the given starting block",
+       "  -E, --toast-endblock         check toast table(s) only up
to the given ending block",

I am not very convinced by this. What's the use case? If you're just
checking a single table, you might want to specify a start and end
block, but then you don't need separate options for the TOAST and
non-TOAST cases, do you? If I want to check pg_statistic, I'll say
pg_amcheck -t pg_catalog.pg_statistic. If I want to check the TOAST
table for pg_statistic, I'll say pg_amcheck -t pg_toast.pg_toast_2619.
In either case, if I want to check just the first three blocks, I can
add -b 0 -e 2.

+       "  -f, --skip-all-frozen        do NOT check blocks marked as
all frozen",
+       "  -v, --skip-all-visible       do NOT check blocks marked as
all visible",

I think this is using up too many one character option names for too
little benefit on things that are too closely related. How about, -s,
--skip=all-frozen|all-visible|none? And then -v could mean verbose,
which could trigger things like printing all the queries sent to the
server, setting PQERRORS_VERBOSE, etc.

+       "  -x, --check-indexes          check btree indexes associated
with tables being checked",
+       "  -X, --skip-indexes           do NOT check any btree indexes",
+       "  -i, --index=PATTERN          check the specified index(es) only",
+       "  -I, --exclude-index=PATTERN  do NOT check the specified index(es)",

This is a lotta controls for something that has gotta have some
default. Either the default is everything, in which case I don't see
why I need -x, or it's nothing, in which case I don't see why I need
-X.

+       "  -c, --check-corrupt          check indexes even if their
associated table is corrupt",
+       "  -C, --skip-corrupt           do NOT check indexes if their
associated table is corrupt",

Ditto. (I think the default be to check corrupt, and there can be an
option to skip it.)

+       "  -a, --heapallindexed         check index tuples against the
table tuples",
+       "  -A, --no-heapallindexed      do NOT check index tuples
against the table tuples",

Ditto. (Not sure what the default should be, though.)

+       "  -r, --rootdescend            search from the root page for
each index tuple",
+       "  -R, --no-rootdescend         do NOT search from the root
page for each index tuple",

Ditto. (Again, not sure about the default.)

I'm also not sure if these descriptions are clear enough, but it may
also be hard to do a good job in a brief space. Still, comparing this
to the documentation of heapallindexed makes me rather nervous. This
is only trying to verify that the index contains all the tuples in the
heap, not that the values in the heap and index tuples actually match.

+typedef struct
+AmCheckSettings
+{
+       char       *dbname;
+       char       *host;
+       char       *port;
+       char       *username;
+} ConnectOptions;

Making the struct name different from the type name seems not good,
and the struct name also shouldn't be on a separate line.

+typedef enum trivalue
+{
+       TRI_DEFAULT,
+       TRI_NO,
+       TRI_YES
+} trivalue;

Ugh. It's not this patch's fault, but we really oughta move this to
someplace more centralized.

+typedef struct
...
+} AmCheckSettings;

I'm not sure I consider all of these things settings, "db" in
particular. But maybe that's nitpicking.

+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+
         const SimpleOidList *exclude_oids,
+
         SimpleOidList *oids
+
         bool strict_names);

This is copied from pg_dump, along with I think at least one other
function from nearby. Unlike the trivalue case above, this would be
the first duplication of this logic. Can we push this stuff into
pgcommon, perhaps?

+       /*
+        * Default behaviors for user settable options.  Note that these default
+        * to doing all the safe checks and none of the unsafe ones,
on the theory
+        * that if a user says "pg_amcheck mydb" without specifying
any additional
+        * options, we should check everything we know how to check without
+        * risking any backend aborts.
+        */

This to me seems too conservative. The result is that by default we
check only tables, not indexes. I don't think that's going to be what
users want. I don't know whether they want the heapallindexed or
rootdescend behaviors for index checks, but I think they want their
indexes checked. Happy to hear opinions from actual users on what they
want; this is just me guessing that you've guessed wrong. :-)

+               if (settings.db == NULL)
+               {
+                       pg_log_error("no connection to server after
initial attempt");
+                       exit(EXIT_BADCONN);
+               }

I think this is documented as meaning out of memory, and reported that
way elsewhere. Anyway I am going to keep complaining until there are
no cases where we tell the user it broke without telling them what
broke. Which means this bit is a problem too:

+       if (!settings.db)
+       {
+               pg_log_error("no connection to server");
+               exit(EXIT_BADCONN);
+       }

Something went wrong, good luck figuring out what it was!

+       /*
+        * All information about corrupt indexes are returned via
ereport, not as
+        * tuples.  We want all the details to report if corruption exists.
+        */
+       PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);

Really? Why? If I need the source code file name, function name, and
line number to figure out what went wrong, that is not a great sign
for the quality of the error reports it produces.

+                       /*
+                        * The btree checking logic which optionally
checks the contents
+                        * of an index against the corresponding table
has not yet been
+                        * sufficiently hardened against corrupt
tables.  In particular,
+                        * when called with heapallindexed true, it
segfaults if the file
+                        * backing the table relation has been
erroneously unlinked.  In
+                        * any event, it seems unwise to reconcile an
index against its
+                        * table when we already know the table is corrupt.
+                        */
+                       old_heapallindexed = settings.heapallindexed;
+                       if (corruptions)
+                               settings.heapallindexed = false;

This seems pretty lame to me. Even if the btree checker can't tolerate
corruption to the extent that the heap checker does, seg faulting
because of a missing file seems like a bug that we should just fix
(and probably back-patch). I'm not very convinced by the decision to
override the user's decision about heapallindexed either. Maybe I lack
imagination, but that seems pretty arbitrary. Suppose there's a giant
index which is missing entries for 5 million heap tuples and also
there's 1 entry in the table which has an xmin that is less than the
pg_clas.relfrozenxid value by 1. You are proposing that because I have
the latter problem I don't want you to check for the former one. But
I, John Q. Smartuser, do not want you to second-guess what I told you
on the command line that I wanted. :-)

I think in general you're worrying too much about the possibility of
this tool causing backend crashes. I think it's good that you wrote
the heapcheck code in a way that's hardened against that, and I think
we should try to harden other things as time permits. But I don't
think that the remote possibility of a crash due to the lack of such
hardening should dictate the design behavior of this tool. If the
crash possibilities are not remote, then I think the solution is to
fix them, rather than cutting out important checks.

It doesn't seem like great design to me that get_table_check_list()
gets just the OID of the table itself, and then later if we decide to
check the TOAST table we've got to run a separate query for each table
we want to check to fetch the TOAST OID, when we could've just fetched
both in get_table_check_list() by including two columns in the query
rather than one and it would've been basically free. Imagine if some
user wrote a query that fetched the primary key value for all their
rows and then had their application run a separate query to fetch the
entire contents of each of those rows, said contents consisting of one
more integer. And then suppose they complained about performance. We'd
tell them they were doing it wrong, and so here.

+       if (settings.db == NULL)
+               fatal("no connection on entry to check_table");

Uninformative. Is this basically an Assert? If so maybe just make it
one. If not maybe fail somewhere else with a better message?

+       if (startblock == NULL)
+               startblock = "NULL";
+       if (endblock == NULL)
+               endblock = "NULL";

It seems like it would be more elegant to initialize
settings.startblock and settings.endblock to "NULL." However, there's
also a related problem, which is that the startblock and endblock
values can be anything, and are interpolated with quoting. I don't
think that it's good to ship a tool with SQL injection hazards built
into it. I think that you should (a) check that these values are
integers during argument parsing and error out if they are not and
then (b) use either a prepared query or PQescapeLiteral() anyway.

+       stop = (on_error_stop) ? "true" : "false";
+       toast = (check_toast) ? "true" : "false";

The parens aren't really needed here.

+
printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+                                  PQgetvalue(res, i, 0),       /* relname */
+                                  PQgetvalue(res, i, 1),       /* blkno */
+                                  PQgetvalue(res, i, 2),       /* offnum */
+                                  PQgetvalue(res, i, 3),       /* attnum */
+                                  PQgetvalue(res, i, 4));      /* msg */

I am not quite sure how to format the output, but this looks like
something designed by an engineer who knows too much about the topic.
I suspect users won't find the use of things like "relname" and
"blkno" too easy to understand. At least I think we should say
"relation, block, offset, attribute" instead of "relname, blkno,
offnum, attnum". I would probably drop the parenthesis and add spaces,
so that you end up with something like:

relation "%s", block "%s", offset "%s", attribute "%s":

I would also define variant strings so that we entirely omit things
that are NULL. e.g. have four strings:

relation "%s":
relation "%s", block "%s":(
relation "%s", block "%s", offset "%s":
relation "%s", block "%s", offset "%s", attribute "%s":

Would it make it more readable if we indented the continuation line by
four spaces or something?

+               corruption_cnt++;
+               printf("%s\n", error);
+               pfree(error);

Seems like we could still print the relation name in this case, and
that it would be a good idea to do so, in case it's not in the message
that the server returns.

The general logic in this part of the code looks a bit strange to me.
If ExecuteSqlQuery() returns PGRES_TUPLES_OK, we print out the details
for each returned row. Otherwise, if error = true, we print the error.
But, what if neither of those things are the case? Then we'd just
print nothing despite having gotten back some weird response from the
server. That actually can't happen, because ExecuteSqlQuery() always
sets *error when the return code is not PGRES_TUPLES_OK, but you
wouldn't know that from looking at this code.

Honestly, as written, ExecSqlQuery() seems like kind of a waste. The
OrDie() version is useful as a notational shorthand, but this version
seems to add more confusion than clarity. It has only three callers:
the ones in check_table() and check_indexes() have the problem
described above, and the one in get_toast_oid() could just as well be
using the OrDie() version. And also we should probably get rid of it
entirely by fetching the toast OIDs the first time around, as
mentioned above.

check_indexes() lacks a function comment. It seems to have more or
less the same problem as get_toast_oid() -- an extra query per table
to get the list of indexes. I guess it has a better excuse: there
could be lots of indexes per table, and we're fetching multiple
columns of data for each one, whereas in the TOAST case we are issuing
an extra query per table to fetch a single integer. But, couldn't we
fetch information about all the indexes we want to check in one go,
rather than fetching them separately for each table being checked? I'm
not sure if that would create too much other complexity, but it seems
like it would be quicker.

+       if (settings.db == NULL)
+               fatal("no connection on entry to check_index");
+       if (idxname == NULL)
+               fatal("no index name on entry to check_index");
+       if (tblname == NULL)
+               fatal("no table name on entry to check_index");

Again, probably these should be asserts, or if they're not, the error
should be reported better and maybe elsewhere.

Similarly in some other places, like expand_schema_name_patterns().

+        * The loop below runs multiple SELECTs might sometimes result in
+        * duplicate entries in the Oid list, but we don't care.

This is missing a which, like the place you copied it from, but the
version in pg_dumpall.c is better.

expand_table_name_patterns() should be reformatted to not gratuitously
exceed 80 columns.  Ditto for expand_index_name_patterns().

I sort of expected that this patch might use threads to allow parallel
checking - seems like it would be a useful feature.

I originally intended to review the docs and regression tests in the
same email as the patch itself, but this email has gotten rather long
and taken rather longer to get together than I had hoped, so I'm going
to stop here for now and come back to that stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Skip ExecCheckRTPerms in CTAS with no data
Next
From: Tom Lane
Date:
Subject: Re: Should we document IS [NOT] OF?