Re: [PATCH] Verify Checksums during Basebackups - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: [PATCH] Verify Checksums during Basebackups
Date
Msg-id CABUevEyTJTvn328B6Jb=LdZFZJE6p0MPT=HivSXs-KbxGpqrGw@mail.gmail.com
Whole thread Raw
In response to [PATCH] Verify Checksums during Basebackups  (Michael Banck <michael.banck@credativ.de>)
Responses Re: [PATCH] Verify Checksums during Basebackups  (Robert Haas <robertmhaas@gmail.com>)
Re: [PATCH] Verify Checksums during Basebackups  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers


On Wed, Feb 28, 2018 at 7:08 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

some installations have data which is only rarerly read, and if they are
so large that dumps are not routinely taken, data corruption would only
be detected with some large delay even with checksums enabled.

I think this is a very common scenario. Particularly when you take into account indexes and things like that.


The attached small patch verifies checksums (in case they are enabled)
during a basebackup. The rationale is that we are reading every block in
this case anyway, so this is a good opportunity to check them as well.
Other and complementary ways of checking the checksums are possible of
course, like the offline checking tool that Magnus just submitted.

It probably makes sense to use the same approach for determining the
segment numbers as Magnus did in his patch, or refactor that out in a
utility function, but I'm sick right now so wanted to submit this for
v11 first.

I did some light benchmarking and it seems that the performance
degradation is minimal, but this could well be platform or
architecture-dependent. Right now, the checksums are always checked but
maybe this could be made optional, probably by extending the replication
protocol.

I think it should be.

I think it would also be a good idea to have this a three-mode setting, with "no check", "check and warning", "check and error". Where "check and error" should be the default, but you could turn off that in "save whatever is left mode". But I think it's better if pg_basebackup simply fails on a checksum error, because that will make it glaringly obvious that there is a problem -- which is the main point of checksums in the first place. And then an option to turn it off completely in cases where performance is the thing.

Another quick note -- we need to assert that the size of the buffer is actually divisible by BLCKSZ. I don't think it's a common scenario, but it could break badly if somebody changes BLCKSZ. Either that or perhaps just change the TARSENDSIZE to be a multiple of BLCKSZ.



--

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [HACKERS] path toward faster partition pruning
Next
From: Amit Langote
Date:
Subject: Re: [HACKERS] path toward faster partition pruning