Re: [PATCH] Verify Checksums during Basebackups - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: [PATCH] Verify Checksums during Basebackups
Date
Msg-id CABUevEyfwOg97PZXvw-wOGLSXK5ZHxx=Nx4CqozoDdUj7Rd-aw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Verify Checksums during Basebackups  (Michael Banck <michael.banck@credativ.de>)
Responses Re: [PATCH] Verify Checksums during Basebackups  (Michael Banck <michael.banck@credativ.de>)
List pgsql-hackers
On Sat, Mar 17, 2018 at 10:34 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> Possibly open questions:
>
> 1. I have not so far changed the replication protocol to make verifying
> checksums optional. I can go about that next if the consensus is that we
> need such an option (and cannot just check it everytime)?

I think most people (including those I had off-list discussions about
this with) were of the opinion that such an option should be there, so I
added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
replication command and also an option -k / --no-verify-checksums to
pg_basebackup to trigger this.

Updated patch attached.


Notes:

+   if (checksum_failure == true)

Really, just if(checksum_failure)

+           errmsg("checksum mismatch during basebackup")));

Should be "base backup" in messages.

+static const char *skip[] = {

I think that needs a much better name than just "skip". Skip for what? In particular since we are just skipping it for checksums, and not for the actual basebackup, that name is actively misinforming.

+   filename = basename(pstrdup(readfilename));
+   if (!noverify_checksums && DataChecksumsEnabled() &&
+       !skipfile(filename) &&
+       (strncmp(readfilename, "./global/", 9) == 0 ||
+       strncmp(readfilename, "./base/", 7) == 0 ||
+       strncmp(readfilename, "/", 1) == 0))
+       verify_checksum = true;

I would include the checks for global, base etc into the skipfile() function as well (also renamed).

+                * Only check pages which have not been modified since the
+                * start of the base backup.

I think this needs a description of why, as well (without having read this thread, this is a pretty subtle case).


+system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000', 'bs=9', 'count=1', 'if=/dev/zero', "of=$pgdata/$pg_class";

This part of the test will surely fail on Windows, not having a /dev/zero. Can we easily implement this part natively in perl perhaps? Somebody who knows more about which functionality is OK to use within this system can perhaps comment?

Most of that stuff is trivial and can be cleaned up at commit time. Do you want to send an updated patch with a few of those fixes, or should I clean it?

The test thing is a stopper until we figure that one out though. And while at it -- it seems we don't have any tests for the checksum feature in general. It would probably make sense to consider that at the same time as figuring out the right way to do this one.

--

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: WIP: a way forward on bootstrap data
Next
From: Pavel Stehule
Date:
Subject: Re: missing support of named convention for procedures