Re: [PATCH] Verify Checksums during Basebackups - Mailing list pgsql-hackers

From Michael Banck
Subject Re: [PATCH] Verify Checksums during Basebackups
Date
Msg-id 1521797787.15036.23.camel@credativ.de
Whole thread Raw
In response to Re: [PATCH] Verify Checksums during Basebackups  (David Steele <david@pgmasters.net>)
Responses Re: [PATCH] Verify Checksums during Basebackups  (David Steele <david@pgmasters.net>)
List pgsql-hackers
Hi David,

thanks for the review!

Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:
> On 3/17/18 5:34 PM, Michael Banck wrote:
> > On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> > I think most people (including those I had off-list discussions about
> > this with) were of the opinion that such an option should be there, so I
> > added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> > replication command and also an option -k / --no-verify-checksums to
> > pg_basebackup to trigger this.
> > 
> > Updated patch attached.
> 
> +    memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
> 
> Why make a copy here?  How about:
> 
> char *page = buf + BLCKSZ * i

Right, ok.

> I know pg_checksum_page manipulates the checksum field but I have found
> it to be safe.
> 
> +    if (phdr->pd_checksum != checksum)
> 
> I've attached a patch that adds basic retry functionality.  It's not
> terrible efficient since it rereads the entire buffer for any block
> error.  A better way is to keep a bitmap for each block in the buffer,
> then on retry compare bitmaps.  If the block is still bad, report it.
> If the block was corrected moved on.  If a block was good before but is
> bad on retry it can be ignored.

I have to admit I find it a bit convoluted and non-obvious on first
reading, but I'll try to check it out some more.

I think it would be very useful if we could come up with a testcase
showing this problem, but I guess this will be quite hard to hit
reproducibly, right?

> +    ereport(WARNING,
> +        (errmsg("checksum verification failed in file "
> 
> I'm worried about how verbose this warning could be since there are
> 131,072 blocks per segment.  It's unlikely to have that many block
> errors, but users do sometimes put files in PGDATA which look like they
> should be validated.  Since these warnings all go to the server log it
> could get pretty bad.

We only verify checksums of files in tablespaces, and I don't think
dropping random files in those is supported in any way, as opposed to
maybe the top-level PGDATA directory. So I would say that this is not a
real concern.

> We should either stop warning after the first failure, or aggregate the
> failures for a file into a single message.

I agree that major corruption could make the whole output blow up but I
would prefer to keep this feature simple for now, which implies possibly
 printing out a lot of WARNING or maybe just stopping after the first
one (or first few, dunno).


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Comment update in BuildTupleFromCStrings()
Next
From: Pavan Deolasee
Date:
Subject: Re: Faster inserts with mostly-monotonically increasing values