Re: backup manifests - Mailing list pgsql-hackers

From David Steele
Subject Re: backup manifests
Date
Msg-id 2f4dd3b8-fc39-7a34-d0e3-17fc14e041ea@pgmasters.net
Whole thread Raw
In response to Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 11/22/19 10:58 AM, Robert Haas wrote:
> On Tue, Nov 19, 2019 at 8:49 AM Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> I admit I haven't been following along closely, but why do we need a
>> cryptographic checksum here instead of, say, a CRC? Do we think that
>> somehow the checksum might be forged? Use of cryptographic hashes as
>> general purpose checksums has become far too common IMNSHO.
> 
> I tend to agree with you. I suspect if we just use CRC, some people
> are going to complain that they want something "stronger" because that
> will make them feel better about error detection rates or obscure
> threat models or whatever other things a SHA-based approach might be
> able to catch that CRC would not catch. 

Well, the maximum amount of data that can be protected with a 32-bit CRC
is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
presume that's what we are talking about since I can't find any 64-bit
CRC code in core or this patch.

So, that's half of what we need with the default relation segment size
(I've seen larger in the field).

> I don't think we
> should offer an option for MD5, because MD5 is a dirty word these days
> and will cause problems for users who have to worry about FIPS 140-2
> compliance. 

+1.

> Phrased more positively, if you want a cryptographic hash
> at all, you should probably use one that isn't widely viewed as too
> weak.

Sure.  There's another advantage to picking an algorithm with lower
collision rates, though.

CRCs are fine for catching transmission errors (as caveated above) but
not as great for comparing two files for equality.  With strong hashes
you can confidently compare local files against the path, size, and hash
stored in the manifest and save yourself a round-trip to the remote
storage to grab the file if it has not changed locally.

This is the basic premise of what we call delta restore which can speed
up restores by orders of magnitude.

Delta restore is the main advantage that made us decide to require SHA1
checksums.  In most cases, restore speed is more important than backup
speed.

Regards,
-- 
-David
david@pgmasters.net



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: [PATCH][BUG FIX] Uninitialized variable parsed
Next
From: Robert Haas
Date:
Subject: Re: backup manifests