Re: [HACKERS] WAL consistency check facility - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] WAL consistency check facility
Date
Msg-id CA+TgmoYVRmKySQLiv-+M4ho_JcN_de6P72CLUhFFnAJOVkVG5w@mail.gmail.com
Whole thread Raw
In response to Re: WAL consistency check facility  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [HACKERS] WAL consistency check facility
Re: [HACKERS] WAL consistency check facility
List pgsql-hackers
On Mon, Nov 28, 2016 at 11:31 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Moved to CF 2017-01, as no committers have showed up yet :(

Seeing no other volunteers, here I am.

On a first read-through of this patch -- I have not studied it in
detail yet -- this looks pretty good to me.  One concern is that this
patch adds a bit of code to XLogInsert(), which is a very hot piece of
code.  Conceivably, that might produce a regression even when this is
disabled; if so, we'd probably need to make it a build-time option.  I
hope that's not necessary, because I think it would be great to
compile this into the server by default, but we better make sure it's
not a problem.  A bulk load into an existing table might be a good
test case.

Aside from that, I think the biggest issue here is that the masking
functions are virtually free of comments, whereas I think they should
have extensive and detailed comments.  For example, in heap_mask, you
have this:

+            page_htup->t_infomask =
+                HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID |
+                HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;

For something like this, you could write "We want to ignore
differences in hint bits, since they can be set by SetHintBits without
emitting WAL.  Force them all to be set so that we don't notice
discrepancies."  Actually, though, I think that you could be a bit
more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID =
HEAP_XMIN_FROZEN, so maybe what you should do is clear
HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if
one is set but not both.

Anyway, leaving that aside, I think every single change that gets
masked in every single masking routine needs a similar comment,
explaining why that change can happen on the master without also
happening on the standby and hopefully referring to the code that
makes that unlogged change.

I think wal_consistency_checking, as proposed by Peter, is better than
wal_consistency_check, as implemented.

Having StartupXLOG() call pfree() on the masking buffers is a waste of
code.  The process is going to exit anyway.

+                 "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno %u",

Primary error messages aren't capitalized.

+        if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
+        {
+            /* Caller specified a bogus block_id. Do nothing. */
+            continue;
+        }

Why would the caller do something so dastardly?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: [HACKERS] Getting rid of "unknown error" in dblink andpostgres_fdw
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw