Re: backup manifests - Mailing list pgsql-hackers

From Robert Haas
Subject Re: backup manifests
Date
Msg-id CA+TgmoYj60PYvh36OSUrBmKEytsemk=_6u0jzvE+8uuJauaVzw@mail.gmail.com
Whole thread Raw
In response to Re: backup manifests  (Stephen Frost <sfrost@snowman.net>)
Responses Re: backup manifests
Re: backup manifests
List pgsql-hackers
On Thu, Mar 26, 2020 at 12:34 PM Stephen Frost <sfrost@snowman.net> wrote:
> I do agree with excluding things like md5 and others that aren't good
> options.  I wasn't saying we should necessarily exclude crc32c either..
> but rather saying that it shouldn't be the default.
>
> Here's another way to look at it- where do we use crc32c today, and how
> much data might we possibly be covering with that crc?

WAL record size is a 32-bit unsigned integer, so in theory, up to 4GB
minus 1 byte. In practice, most of them are not more than a few
hundred bytes, the amount we might possibly be covering is a lot more.

> Why was crc32c
> picked for that purpose?

Because it was discovered that 64-bit CRC was too slow, per commit
21fda22ec46deb7734f793ef4d7fa6c226b4c78e.

> If the individual who decided to pick crc32c
> for that case was contemplating a checksum for up-to-1GB files, would
> they have picked crc32c?  Seems unlikely to me.

It's hard to be sure what someone who isn't us would have done in some
situation that they didn't face, but we do have the discussion thread:

https://www.postgresql.org/message-id/flat/9291.1117593389%40sss.pgh.pa.us#c4e413bbf3d7fbeced7786da1c3aca9c

The question of how much data is protected by the CRC was discussed,
mostly in the first few messages, in general terms, but it doesn't
seem to have covered the question very thoroughly. I'm sure we could
each draw things from that discussion that support our view of the
situation, but I'm not sure it would be very productive.

What confuses to me is that you seem to have a view of the upsides and
downsides of these various algorithms that seems to me to be highly
skewed. Like, suppose we change the default from CRC-32C to
SHA-something. On the upside, the error detection rate will increase
from 99.9999999+% to something much closer to 100%. On the downside,
backups will get as much as 40-50% slower for some users. I hope we
can agree that both detecting errors and taking backups quickly are
important. However, it is hard for me to imagine that the typical user
would want to pay even a 5-10% performance penalty when taking a
backup in order to improve an error detection feature which they may
not even use and which already has less than a one-in-a-billion chance
of going wrong. We routinely reject features for causing, say, a 2%
regression on general workloads. Base backup speed is probably less
important than how many SELECT or INSERT queries you can pump through
the system in a second, but it's still a pain point for lots of
people. I think if you said to some users "hey, would you like to have
error detection for your backups? it'll cost 10%" many people would
say "yes, please." But I think if you went to the same users and said
"hey, would you like to make the error detection for your backups
better? it currently has a less than 1-in-a-billion chance of failing
to detect random corruption, and you can reduce that by many orders of
magnitude for an extra 10% on your backup time," I think the results
would be much more mixed. Some people would like it, but it certainly
not everybody.

> I'm not actually argueing about which hash functions we should support,
> but rather what the default is and if crc32c, specifically, is actually
> a reasonable choice.  Just because it's fast and we already had an
> implementation of it doesn't justify its use as the default.  Given that
> it doesn't actually provide the check that is generally expected of
> CRC checksums (100% detection of single-bit errors) when the file size
> gets over 512MB makes me wonder if we should have it at all, yes, but it
> definitely makes me think it shouldn't be our default.

I mean, the property that I care about is the one where it detects
better than 999,999,999 errors out of every 1,000,000,000, regardless
of input length.

> I don't agree with limiting our view to only those algorithms that we've
> already got implemented in PG.

I mean, opening that giant can of worms ~2 weeks before feature freeze
is not very nice. This patch has been around for months, and the
algorithms were openly discussed a long time ago. I checked and found
out that the CRC-64 code was nuked in commit
404bc51cde9dce1c674abe4695635612f08fe27e, so in theory we could revert
that, but how much confidence do we have that the code in question
actually did the right thing, or that it's actually fast? An awful lot
of work has been done on the CRC-32C code over the years, including
several rounds of speeding it up
(f044d71e331d77a0039cec0a11859b5a3c72bc95,
3dc2d62d0486325bf263655c2d9a96aee0b02abe) and one round of fixing it
because it was producing completely wrong answers
(5028f22f6eb0579890689655285a4778b4ffc460), so I don't have a lot of
confidence about that CRC-64 code being totally without problems.

The commit message for that last commit,
5028f22f6eb0579890689655285a4778b4ffc460, seems pretty relevant in
this context, too. It observes that, because it "does not correspond
to any bit-wise CRC calculation" it is "difficult to reason about its
properties." In other words, the algorithm that we used for WAL
records for many years likely did not have the guaranteed
error-detection properties with which you are so concerned (nor do
most hash functions we might choose; CRC-64 is probably the only
choice that would). Despite that, the commit message also observed
that "it has worked well in practice." I realize I'm not convincing
you of anything here, but the guaranteed error-detection properties of
CRC are almost totally uninteresting in this context. I'm not
concerned that CRC-32C doesn't have those properties. I'm not
concerned that SHA-n wouldn't have those properties. I'm not concerned
that xxhash or HighwayHash don't have that property either. I doubt
the fact that CRC-64 would have that property would give us much
benefit. I think the only things that matter here are (1) how many
bits you get (more bits = better chance of finding errors, but even
*sixteen* bits would give you a pretty fair chance of noticing if
things are broken) and (2) whether you want a cryptographic hash
function so that you can keep the backup manifest in a vault.

> It's saying, removing the listing aspect, exactly that "backup_label is
> excluded from verification".  That's what I am taking issue with.  I've
> made multiple attempts to suggest other language to avoid saying that
> because it's clearly wrong- the manifest is verified.

Well, it's talking about the particular kind of verification that has
just been discussed, not any form of verification. As one idea,
perhaps instead of:

+ Certain files and directories are
+   excluded from verification:

...I could maybe insert a paragraph break there and then continue with
something like this:

When pg_basebackup compares the files and directories in the manifest
to those which are present on disk, it will ignore the presence of, or
changes to, certain files:

backup_manifest will not be present in the manifest itself, and is
therefore ignored. Note that the manifest is still verified
internally, as described above, but no error will be issued about the
presence of a backup_manifest file in the backup directory even though
it is not listed in the manifest.

Would that be more clear? Do you want to suggest something else?

> I'm not talking about making sure that no error ever happens when doing
> I'm saying that the existing tool that takes the backup has a *really*
> *important* verification check that this proposed "validate backup" tool
> doesn't have, and that isn't sensible.  It leads to situations where the
> backup tool itself, pg_basebackup, can fail or be killed before it's
> actually completed, and the "validate backup" tool would say that the
> backup is perfectly fine.  That is not sensible.

If someone's procedure for taking and restoring backups involves not
knowing whether or not pg_basebackup completed without error and then
trying to use the backup anyway, they are doing something which is
very foolish, and it's questionable whether any technological solution
has much hope of getting them out of trouble. But on the plus side,
this patch would have a good chance of detecting the problem, which is
a noticeable improvement over what we have now, which has no chance of
detecting the problem, because we have nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace onthe fly
Next
From: Tom Lane
Date:
Subject: Re: plan cache overhead on plpgsql expression