Re: backup manifests - Mailing list pgsql-hackers

From David Steele
Subject Re: backup manifests
Date
Msg-id 228fdaa6-616e-68ed-7681-002eabb9666d@pgmasters.net
Whole thread Raw
In response to Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 11/22/19 2:01 PM, Robert Haas wrote:
> On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote:
>> Well, the maximum amount of data that can be protected with a 32-bit CRC
>> is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
>> presume that's what we are talking about since I can't find any 64-bit
>> CRC code in core or this patch.
> 
> Could you give a more precise citation for this? 

See:
https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf
Search for "The maximum block size"

https://en.wikipedia.org/wiki/Cyclic_redundancy_check
"The design of the CRC polynomial depends on the maximum total length of
the block to be protected (data + CRC bits)", which I took to mean there
are limits.

Here another interesting bit from:
https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks
"Because a CRC is based on division, no polynomial can detect errors
consisting of a string of zeroes prepended to the data, or of missing
leading zeroes" -- but it appears to matter what CRC you are using.
There's a variation that works in this case and hopefully we are using
that one.

This paper talks about appropriate block lengths vs crc length:
http://users.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf
but it is concerned with network transmission and small block lengths.

> "Typically an n-bit CRC applied to a data block of arbitrary length
> will detect any single error burst not longer than n bits, and the
> fraction of all longer error bursts that it will detect is (1 −
> 2^−n)."

I'm not sure how encouraging I find this -- a four-byte error not a lot
and 2^32 is only 4 billion.  We have individual users who have backed up
more than 4 billion files over the last few years.

>> This is the basic premise of what we call delta restore which can speed
>> up restores by orders of magnitude.
>>
>> Delta restore is the main advantage that made us decide to require SHA1
>> checksums.  In most cases, restore speed is more important than backup
>> speed.
> 
> I see your point, but it's not the whole story. We've encountered a
> bunch of cases where the time it took to complete a backup exceeded
> the user's desired backup interval, which is obviously very bad, or
> even more commonly where it exceeded the length of the user's
> "low-usage" period when they could tolerate the extra overhead imposed
> by the backup. A few percentage points is probably not a big deal, but
> a user who has an 8-hour window to get the backup done overnight will
> not be happy if it's taking 6 hours now and we tack 40%-50% on to
> that. So I think that we either have to disable backup checksums by
> default, or figure out a way to get the overhead down to something a
> lot smaller than what current tests are showing -- which we could
> possibly do without changing the algorithm if we can somehow make it a
> lot cheaper, but otherwise I think the choice is between disabling the
> functionality altogether by default and adopting a less-expensive
> algorithm. Maybe someday when delta restore is in core and widely used
> and CPUs are faster, it'll make sense to revise the default, and
> that's cool, but I can't see imposing a big overhead by default to
> enable a feature core doesn't have yet...

OK, I'll buy that.  But I *don't* think CRCs should be allowed for
deltas (when we have them) and I *do* think we should caveat their
effectiveness (assuming we can agree on them).

In general the answer to faster backups should be more cores/faster
network/faster disk, not compromising backup integrity.  I understand
we'll need to wait until we have parallelism in pg_basebackup to justify
that answer.

Regards,
-- 
-David
david@pgmasters.net



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Assertion failing in master, predicate.c
Next
From: Mark Dilger
Date:
Subject: Re: Assertion failing in master, predicate.c