Re: backup manifests - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: backup manifests
Date
Msg-id 20200325133106.GS13712@tamriel.snowman.net
Whole thread Raw
In response to Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Mar 23, 2020 at 6:42 PM Stephen Frost <sfrost@snowman.net> wrote:
> > While I get the desire to have a default here that includes checksums,
> > the way the command is structured, it strikes me as odd that the lack of
> > MANIFEST_CHECKSUMS in the command actually results in checksums being
> > included.
>
> I don't think that's quite accurate, because the default for the
> MANIFEST option is 'no', so the actual default if you say nothing
> about manifests at all, you don't get one. However, it is true that if
> you ask for a manifest and you don't specify the type of checksums,
> you get CRC-32C. We could change it so that if you ask for a manifest
> you must also specify the type of checksum, but I don't see any
> advantage in that approach. Nothing prevents the client from
> specifying the value if it cares, but making the default "I don't
> care, you pick" seems pretty sensible. It could be really helpful if,
> for example, we decide to remove the initial default in a future
> release for some reason. Then the client just keeps working without
> needing to change anything, but anyone who explicitly specified the
> old default gets an error.

I get that the default for manifest is 'no', but I don't really see how
that means that the lack of saying anything about checksums should mean
"give me crc32c checksums".  It's really rather common that if we don't
specify something, it means don't do that thing- like an 'ORDER BY'
clause.  We aren't designing SQL here, so I'm not going to get terribly
upset if you push forward with "if you don't want checksums, you have to
explicitly say MANIFEST_CHECKSUMS no", but I don't agree with the
reasoning here.

> > > +     <varlistentry>
> > > +      <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
> > > +      <listitem>
> > > +       <para>
> > > +        Specifies the algorithm that should be used to checksum each file
> > > +        for purposes of the backup manifest. Currently, the available
> > > +        algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
> > > +        <literal>SHA224</literal>, <literal>SHA256</literal>,
> > > +        <literal>SHA384</literal>, and <literal>SHA512</literal>.
> > > +        The default is <literal>CRC32C</literal>.
> > > +       </para>
> >
> > As I recall, there was an invitation to argue about the defaults at one
> > point, and so I'm going to say here that I would advocate for a
> > different default than 'crc32c'.  Specifically, I would think sha256 or
> > 512 would be better.  I don't recall seeing a debate about this that
> > conclusively found crc32c to be better, but I'm happy to go back and
> > reread anything someone wants to point me at.
>
> It was discussed upthread. Andrew Dunstan argued that there was no
> reason to use a cryptographic checksum here and that we shouldn't do
> so gratuitously. Suraj Kharage found that CRC-32C has very little
> performance impact but that any of the SHA functions slow down backups
> considerably. David Steele pointed out that you'd need a better
> checksum if you wanted to use it for purposes such as delta restore,
> with which I agree, but that's not the design center for this feature.
> I concluded that different people wanted different things, so that we
> ought to make this configurable, but that CRC-32C is a good default.
> It has approximately a 99.9999999767169% chance of detecting a random
> error, which is pretty good, and it doesn't drastically slow down
> backups, which is also good.

There were also comments made up-thread about how it might not be great
for larger (eg: 1GB files, like we tend to have quite a few of...), and
something about it being a 40 year old algorithm..  Having re-read some
of the discussion, I'm actually more inclined to say we should be using
sha256 instead of crc32c.

> > It also seems a bit silly to me that using the defaults means having to
> > deal with two different algorithms- crc32c and sha256.  Considering how
> > fast these algorithms are, compared to everything else involved in a
> > backup (particularly one that's likely going across a network...), I
> > wonder if we should say "may slightly increase" above.
>
> Actually, Suraj's results upthread show that it's a pretty big hit.

So, I went back and re-read part of the thread and looked at the
(seemingly, only one..?) post regarding timing and didn't understand
what, exactly, was being timed there, because I didn't see the actual
commands/script/whatever that was used to get those results included.

I'm sure that sha256 takes a lot more time than crc32c, I'm certainly
not trying to dispute that, but what's relevent here is how much it
impacts the time required to run the overall backup (including sync'ing
it to disk, and possibly network transmission time..  if we're just
comparing the time to run it through memory then, sure, the sha256
computation time might end up being quite a bit of the time, but that's
not really that interesting of a test..).

> > > +       <para>
> > > +        On the other hand, <literal>CRC32C</literal> is not a cryptographic
> > > +        hash function, so it is only suitable for protecting against
> > > +        inadvertent or random modifications to a backup. An adversary
> > > +        who can modify the backup could easily do so in such a way that
> > > +        the CRC does not change, whereas a SHA collision will be hard
> > > +        to manufacture. (However, note that if the attacker also has access
> > > +        to modify the backup manifest itself, no checksum algorithm will
> > > +        provide any protection.) An additional advantage of the
> > > +        <literal>SHA</literal> family of functions is that they output
> > > +        a much larger number of bits.
> > > +       </para>
> >
> > I'm not really sure that this paragraph is sensible to include..  We
> > certainly don't talk about adversaries and cryptographic hash functions
> > when we talk about our page-level checksums, for example.  I'm not
> > completely against including it, but I don't want to give the impression
> > that this is something we routinely consider or that lack of discussion
> > elsewhere implies we have protections against a determined attacker.
>
> Given the skepticism from some quarters about CRC-32C on this thread,
> I didn't want to oversell it. Also, I do think that these things are
> possibly things that we should consider more widely. I agree with
> Andrew's complaint that it's far too easy to just throw SHA<lots> at
> problems that don't really require it without any actually good
> reason. Spelling out our reasons for choosing certain algorithms for
> certain purposes seems like a good habit to get into, and if we
> haven't done it in other places, maybe we should. On the other hand,
> while I'm inclined to keep this paragraph, I won't lose much sleep if
> we decide to remove it.

I don't mind spelling out reasoning for certain algorithms over others,
in general, this just seems a bit much.  I'm not sure we need to be
going into what being a cryptographic hash function means every time we
talk about any hash or checksum.  Those who actually care about
cryptographic hash function usage really don't need someone to explain
to them that crc32c isn't cryptographically secure.  The last sentence
also seems kind of odd (why is a much larger number of bits, alone, an
advantage..?).

I tried to figure out a way to rewrite this and I feel like I keep
ending up coming back to something like "CRC32C is a CRC, not a hash"
and that kind of truism just doesn't feel terribly useful to include in
our documentation.

Maybe:

"Using a SHA hash function provides a cryptographically secure digest
of each file for users who wish to verify that the backup has not been
tampered with, while the CRC32C algorithm provides a checksum which is
much faster to calculate and good at catching errors due to accidental
changes but is not resistent to targeted modifications.  Note that, to
be useful against an adversary who has access to the backup, the backup
manifest would need to be stored securely elsewhere or otherwise
verified to have not been modified since the backup was taken."

This at least talks about things in a positive direction (SHA hash
functions do this, CRC32C does that) rather than in a negative tone.

> > This seems to invite the idea that pg_validatebackup should be able to
> > work with external backup solutions- but I'm a bit concerned by that
> > idea because it seems like it would then mean we'd have to be
> > particularly careful when changing things in this area, and I'm not
> > thrilled by that.  I'd like to make sure that new versions of
> > pg_validatebackup work with older backups, and, ideally, older versions
> > of pg_validatebackup would work even with newer backups, all of which I
> > think the json structure of the manifest helps us with, but that's when
> > we're building the manifest and know what it's going to look like.
>
> Both you and David made forceful arguments that this needed to be JSON
> rather than an ad-hoc text format precisely so that other tools could
> parse it more easily, and I just spent *a lot* of time making the JSON
> parsing stuff work precisely so that you could have that. This project
> would've been done a month ago if not for that. I don't care all that
> much whether we remove the mention here, but the idea that using JSON
> was so that pg_validatebackup could manage compatibility issues is
> just not correct. The version number on line 1 of the file was more
> than sufficient for that purpose.

I stand by the decision that the manifest should be in JSON, but that's
what is produced by the backend server as part of a base backup, which
is quite likely going to be used by some external tools, and isn't at
all the same as the external pg_validatebackup command that the
discussion here is about.  I also did make the argument up-thread,
though I'll admit that it seemed to be mostly ignored, but I make it
still, that a simple version number sucks and using JSON does avoid some
of the downsides from it.  Particularly, I'd love to see a v13
pg_validatebackup able to work with a v14 pg_basebackup, even if that
v14 pg_basebackup added some extra stuff to the manifest.  That's
possible to do with a generic structure like JSON and not something that
a simple version number would allow.  Yes, I admit that we might change
the structure or the contents in a way where that wouldn't be possible
and I'm not going to raise a fuss if we do so, but this approach gives
us more options.

Anyway, my point here was really just that *pg_validatebackup* is about
validating backups taken with pg_basebackup.  While it's possible that
it could be used for backups taken with other tools, I don't think
that's really part of its actual mandate or that we're going to actively
work to add such support in the future.

> > > +  <itemizedlist>
> > > +    <listitem>
> > > +      <para>
> > > +        <literal>backup_manifest</literal> is ignored because the backup
> > > +        manifest is logically not part of the backup and does not include
> > > +        any entry for itself.
> > > +      </para>
> > > +    </listitem>
> >
> > This seems a bit confusing, doesn't it?  The backup_manifest must exist,
> > and its checksum is internal, and is checked, isn't it?  Why say that
> > it's excluded..?
>
> Well, there's no entry in the backup manifest for backup_manifest
> itself. Normally, the presence of a file not mentioned in
> backup_manifest would cause a complaint about an extra file, but
> because backup_manifest is in the ignore list, it doesn't.

Yes, I get why it's excluded from the manifest and why we have code to
avoid complaining about it being an extra file, but this is
documentation and, in this part of the docs, we seem to be saying that
we're not checking/validating the manifest, and that's certainly not
actually true.

In particular, the sentence right above this list is:

"Certain files and directories are excluded from verification:"

but we actually do verify the manifest, that's all I'm saying here.

Maybe rewording that a bit is what would help, say:

"Certain files and directories are not included in the manifest:"

then have the entry for backup_manifest be something like:
"backup_manifest is not included as it is the manifest itself and is not
logically part of the backup; backup_manifest is checked using its own
internal validation digest" or something along those lines.

> > > +    <listitem>
> > > +      <para>
> > > +        <literal>pg_wal</literal> is ignored because WAL files are sent
> > > +        separately from the backup, and are therefore not described by the
> > > +        backup manifest.
> > > +      </para>
> > > +    </listitem>
> >
> > I don't agree with the choice to exclude the WAL files, considering
> > they're an integral part of a backup, to exclude them means that if
> > they've been corrupted at all then the entire backup is invalid.  You
> > don't want to be discovering that when you're trying to do a restore of
> > a backup that you took with pg_basebackup and which pg_validatebackup
> > says is valid.  After all, the tool being used here, pg_basebackup,
> > *does* also stream the WAL files- there's no reason why we can't
> > calculate a checksum on them and store that checksum somewhere and use
> > it to validate the WAL files.  This, in my opinion, is actually a
> > show-stopper for this feature.  Claiming it's a valid backup when we
> > don't check the absolutely necessary-for-restore WAL is making a false
> > claim, no matter how well it's documented.
>
> The default for pg_basebackup is -Xstream, which means that the WAL
> files are being sent over a separate connection that has no connection
> to the original session. The server, when generating the backup
> manifest, has no idea what WAL files are being sent over that separate
> connection, and thus cannot include them in the manifest. This problem
> could be "solved" by having the client generate the manifest rather
> than the server, but I think that cure would be worse than the
> disease. As it stands, the manifest provides some protection against
> transmission errors, which would be lost with that design. As you
> point out, this clearly can't be done with -Xnone. I think it would be
> possible to support this with -Xfetch, but we'd have to have the
> manifest itself specify whether or not it included files in pg_wal,
> which would require complicating the format a bit. I don't think that
> makes sense. I assume -Xstream is the most commonly-used mode, because
> the default used to be -Xfetch and we changed it, which I think we
> would not have done unless people liked -Xstream significantly better.
> Adding complexity to cater to a non-default case which I suspect is
> not widely used doesn't really make sense to me.

Yeah, I get that it's not easy to figure out how to validate the WAL,
but I stand by my opinion that it's simply not acceptable to exclude the
necessary WAL from verification so and to claim that a backup is valid
when we haven't checked the WAL.

I agree that -Xfetch isn't commonly used and only supporting validation
of WAL when that's used isn't a good answer.

> In the future, we might want to consider improvements which could make
> validation of pg_wal feasible in common cases. Specifically, suppose
> that pg_basebackup could receive the manifest from the server, keep
> all the entries for the existing files just as they are, but add
> entries for WAL files and anything else it may have added to the
> backup, recompute the manifest checksum, and store the resulting
> revised manifest with the backup. That, I think, would be fairly cool,
> but it's a significant body of additional development work, and this
> is already quite a large patch. The patch itself has grown to about
> 3000 lines, and has already 10 preparatory commits doing another ~1500
> lines of refactoring to prepare for it.

Having the client calculate the checksums for the WAL and add them to
the manifest is one approach and could work, but there's others-

- Have the WAL checksums be calculated during the base backup and kept
  somewhere, and then included in the manifest sent by the server- the
  backup_manifest is the last thing we send anyway, isn't it?  And
  surely at the end of the backup we actually do know all of the WAL
  that's needed for the backup to be valid, because we pass that
  information to pg_basebackup to construct the necessary backup_label
  file.

- Validate the WAL using its own internal checksums instead of having
  the manifest involved at all.  That's not ideal since we wouldn't have
  cryptographically secure digests for the WAL, but at least we will
  have validated it and raised the chances that the backup will be able
  to actually be restored using PG a whole bunch.

- With the 'checksum none' option, we aren't really validating contents
  of anything, so in that case it'd actually be alright to simply scan
  the WAL and make sure that we've at least got all of the WAL files
  needed to go from the start of the backup to the end.  I don't think
  just checking that the WAL files exist is a proper solution when it
  comes to a backup where the user has asked for checksums to be
  included though.  I will say that I'm really very surprised that
  pg_validatebackup wasn't already checking that we at least had the WAL
  that is needed, but I don't see any code for that.

> > Not really thrilled with this (pg_basebackup certainly could figure out
> > the checksum for those files...), but I also don't think it's a huge
> > issue as they can be recreated by a user (unlike a WAL file..).
>
> Yeah, same issues, though. Here again, there are several possible
> fixes: (1) make the server modify those files rather than letting
> pg_basebackup do it; (2) make the client compute the manifest rather
> than the server; (3) have the client revise the manifest.  (3) makes
> most sense to me, but I think that it would be better to return to
> that topic at a later date. This is certainly not a perfect feature as
> things stand but I believe it is good enough to provide significant
> benefits.

As I said, I don't consider these files to be as much of an issue and
therefore excluding them and documenting that we do would be alright.  I
don't feel that's an acceptable option for the WAL though.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Columns correlation and adaptive query optimization
Next
From: Peter Eisentraut
Date:
Subject: Re: where EXEC_BACKEND is defined