Re: documenting the backup manifest file format - Mailing list pgsql-hackers
From | David Steele |
---|---|
Subject | Re: documenting the backup manifest file format |
Date | |
Msg-id | 3d49cad1-43f3-4c8a-1676-fb552892e629@pgmasters.net Whole thread Raw |
In response to | Re: documenting the backup manifest file format (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On 4/13/20 4:14 PM, Robert Haas wrote: > On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> Also, I >> see no mention of prettification-chars such as newlines or indentation. >> I suppose if I pass a manifest file through prettification (or Windows >> newline conversion), the checksum may break. > > It would indeed break. I'm not sure what you want me to say here, > though. If you're trying to parse a manifest, you shouldn't care about > how the whitespace is arranged. If you're trying to generate one, you > can arrange it any way you like, as long as you also include it in the > checksum. pgBackRest ignores whitespace but this is a legacy of the way Perl calculated checksums, not an intentional feature. This worked well when the manifest was loaded as a whole, converted to JSON, and checksummed, but it is a major pain for the streaming code we now have in C. I guarantee that that our next manifest version will do a simple checksum of bytes as Robert has done in this feature. So, I'm +1 as implemented. >> Why is the top-level checksum only allowed to be SHA-256, if the files >> can use up to SHA-512? <snip> > I agree that it's a little bit weird that you can have a stronger > checksum for the files instead of the manifest itself, but I also > wonder what the use case would be for using a stronger checksum on the > manifest. David Steele argued that strong checksums on the files could > be useful to software that wants to rifle through all the backups > you've ever taken and find another copy of that file by looking for > something with a matching checksum. CRC-32C wouldn't be strong enough > for that, because eventually you could have enough files that you > start to have collisions. The SHA algorithms output enough bits to > make that quite unlikely. But this argument only makes sense for the > files, not the manifest. Agreed. I think SHA-256 is *more* than enough to protect the manifest against corruption. That said, since the cost of SHA-256 vs. SHA-512 in the context on the manifest is negligible we could just use the stronger algorithm to deflect a similar question going forward. That choice might not age well, but we could always say, well, we picked it because it was the strongest available at the time. Allowing a choice of which algorithm to use for to manifest checksum seems like it will just make verifying the file harder with no tangible benefit. Maybe just a comment in the docs about why SHA-256 was used would be fine. >> (Also, did we intentionally omit the dash in >> hash names, so "SHA-256" to make it SHA256? This will also be critical >> for checksumming the manifest itself.) > > I debated this with myself, settled on this spelling, and nobody > complained until now. It could be changed, though. I didn't have any > particular reason for choosing it except the feeling that people would > probably prefer to type --manifest-checksum=sha256 rather than > --manifest-checksum=sha-256. +1 for sha256 rather than sha-256. Regards, -- -David david@pgmasters.net
pgsql-hackers by date: