Re: documenting the backup manifest file format - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: documenting the backup manifest file format
Date
Msg-id 20200413214256.GA20155@alvherre.pgsql
Whole thread Raw
In response to Re: documenting the backup manifest file format  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: documenting the backup manifest file format  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2020-Apr-13, Robert Haas wrote:

> On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Are these hex figures upper or lower case?  No leading zeroes?  This
> > would normally not matter, but the toplevel checksum will care.
> 
> Not really. You just feed the whole file except for the last line
> through shasum and you get the answer.
> 
> It so happens that the server generates lower-case, but
> pg_verifybackup will accept either.
> 
> Leading zeroes are not omitted. If the checksum's not the right
> length, it ain't gonna work. If SHA is used, it's the same output you
> would get from running shasum -a<whatever> on the file, which is
> certainly a fixed length. I assumed that this followed from the
> statement that there are two characters per byte in the checksum, and
> from the fact that no checksum algorithm I know about drops leading
> zeroes in the output.

Eh, apologies, I was completely unclear -- I was looking at the LSN
fields when writing the above.  So the leading zeroes and letter case
comment refers to those in the LSN values.  I agree that it doesn't
matter as long as the same tool generates the json file and writes the
checksum.

> > Also, I see no mention of prettification-chars such as newlines or
> > indentation.  I suppose if I pass a manifest file through
> > prettification (or Windows newline conversion), the checksum may
> > break.
> 
> It would indeed break. I'm not sure what you want me to say here,
> though. If you're trying to parse a manifest, you shouldn't care about
> how the whitespace is arranged. If you're trying to generate one, you
> can arrange it any way you like, as long as you also include it in the
> checksum.

Yeah, I guess I'm just saying that it feels brittle to have a file
format that's supposed to be good for data exchange and then make it
itself depend on representation details such as the order that fields
appear in, the letter case, or the format of newlines.  Maybe this isn't
really of concern, but it seemed strange.

> > As for Last-Modification, I think the spec should indicate the exact
> > format that's used, because it'll also be critical for checksumming.
> 
> Again, I don't think it really matters for checksumming, but it's
> "YYYY-MM-DD HH:MM:SS TZ" format, where TZ is always GMT.

I agree that whatever format you use will work as long as it isn't
modified.

I think strict ISO 8601 might be preferable (with the T in the middle
and ending in Z instead of " GMT").

> > Why is the top-level checksum only allowed to be SHA-256, if the
> > files can use up to SHA-512?

Thanks for the discussion.  I think you mostly want to make sure that
the manifest is sensible (not corrupt) rather than defend against
somebody maliciously giving you an attacking manifest (??).  I incline
to agree that any SHA-2 hash is going to serve that purpose and have no
further comment to make.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: Michael Paquier
Date:
Subject: Re: pg_basebackup, manifests and backends older than ~12