Thread: backup manifests

backup manifests

From
Robert Haas
Date:
In the lengthy thread on block-level incremental backup,[1] both
Vignesh C[2] and Stephen Frost[3] have suggested storing a manifest as
part of each backup, somethig that could be useful not only for
incremental backups but also for full backups. I initially didn't
think this was necessary,[4] but some of my colleagues figured out
that my design was broken, because my proposal was to detect new
blocks just using LSN, and that ignores the fact that CREATE DATABASE
and ALTER TABLE .. SET TABLESPACE do physical copies without bumping
page LSNs, which I knew but somehow forgot about.  Fortunately, some
of my colleagues realized my mistake in testing.[5] Because of this
problem, for an LSN-based approach to work, we'll need to send not
only an LSN, but also a list of files (and file sizes) that exist in
the previous full backup; so, some kind of backup manifest now seems
like a good idea to me.[6] That whole approach might still be dead on
arrival if it's possible to add new blocks with old LSNs to existing
files,[7] but there seems to be room to hope that there are no such
cases.[8]

So, let's suppose we invent a backup manifest. What should it contain?
I imagine that it would consist of a list of files, and the lengths of
those files, and a checksum for each file. I think you should have a
choice of what kind of checksums to use, because algorithms that used
to seem like good choices (e.g. MD5) no longer do; this trend can
probably be expected to continue. Even if we initially support only
one kind of checksum -- presumably SHA-something since we have code
for that already for SCRAM -- I think that it would also be a good
idea to allow for future changes. And maybe it's best to just allow a
choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
gate, so that we can avoid bikeshedding over which one is secure
enough. I guess we'll still have to argue about the default. I also
think that it should be possible to build a manifest with no
checksums, so that one need not pay the overhead of computing
checksums if one does not wish. Of course, such a manifest is of much
less utility for checking backup integrity, but you can still check
that you've got the right files, which is noticeably better than
nothing.  The manifest should probably also contain a checksum of its
own contents so that the integrity of the manifest itself can be
verified. And maybe a few other bits of metadata, but I'm not sure
exactly what.  Ideas?

Once we invent the concept of a backup manifest, what do we need to do
with them? I think we'd want three things initially:

(1) When taking a backup, have the option (perhaps enabled by default)
to include a backup manifest.
(2) Given an existing backup that has not got a manifest, construct one.
(3) Cross-check a manifest against a backup and complain about extra
files, missing files, size differences, or checksum mismatches.

One thing I'm not quite sure about is where to store the backup
manifest. If you take a base backup in tar format, you get base.tar,
pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
Does the backup manifest go into base.tar? Get written into a separate
file outside of any tar archive? Something else? And what about a
plain-format backup? I suppose then we should just write the manifest
into the top level of the main data directory, but perhaps someone has
another idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]
https://www.postgresql.org/message-id/flat/CA%2BTgmoYxQLL%3DmVyN90HZgH0X_EUrw%2BaZ0xsXJk7XV3-3LygTvA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CALDaNm310fUZ72nM2n%3DcD0eSHKRAoJPuCyvvR0dhTEZ9Oytyzg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/20190916143817.GA6962%40tamriel.snowman.net
[4] https://www.postgresql.org/message-id/CA%2BTgmoaj-zw4Mou4YBcJSkHmQM%2BJA-dAVJnRP8zSASP1S4ZVgw%40mail.gmail.com
[5] https://www.postgresql.org/message-id/CAM2%2B6%3DXfJX%3DKXvpTgDvgd1rQjya_Am27j4UvJtL3nA%2BJMCTGVQ%40mail.gmail.com
[6] https://www.postgresql.org/message-id/CA%2BTgmoYg9i8TZhyjf8MqCyU8unUVuW%2B03FeBF1LGDu_-eOONag%40mail.gmail.com
[7] https://www.postgresql.org/message-id/CA%2BTgmoYT9xODgEB6y6j93hFHqobVcdiRCRCp0dHh%2BfFzZALn%3Dw%40mail.gmail.com
and nearby messages
[8] https://www.postgresql.org/message-id/20190916173933.GE6962%40tamriel.snowman.net



Re: backup manifests

From
David Steele
Date:
Hi Robert,

On 9/18/19 1:48 PM, Robert Haas wrote:
> That whole approach might still be dead on
> arrival if it's possible to add new blocks with old LSNs to existing
> files,[7] but there seems to be room to hope that there are no such
> cases.[8]

I sure hope there are no such cases, but we should be open to the idea
just in case.

> So, let's suppose we invent a backup manifest. What should it contain?
> I imagine that it would consist of a list of files, and the lengths of
> those files, and a checksum for each file. 

These are essential.

Also consider adding the timestamp.  You have justifiable concerns about
using timestamps for deltas and I get that.  However, there are a number
of methods that can be employed to make it *much* safer.  I won't go
into that here since it is an entire thread in itself.  Suffice to say
we can detect many anomalies in the timestamps and require a checksum
backup when we see them.  I'm really interested in scanning the WAL for
changed files but that method is very complex and getting it right might
be harder than ensuring FS checksums are reliable.  Still worth trying,
though, since the benefits are enormous.  We are planning to use
timestamp + size + wal data to do incrementals if we get there.

Consider adding a reference to each file that specifies where the file
can be found in if it is not in this backup.  As I understand the
pg_basebackup proposal, it would only be implementing differential
backups, i.e. an incremental that is *only* based on the last full
backup.  So, the reference can be inferred in this case.  However, if
the user selects the wrong full backup on restore, and we have labeled
each backup, then a differential restore with references against the
wrong full backup would result in a hard error rather than corruption.

> I think you should have a
> choice of what kind of checksums to use, because algorithms that used
> to seem like good choices (e.g. MD5) no longer do; this trend can
> probably be expected to continue. Even if we initially support only
> one kind of checksum -- presumably SHA-something since we have code
> for that already for SCRAM -- I think that it would also be a good
> idea to allow for future changes. And maybe it's best to just allow a
> choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
> gate, so that we can avoid bikeshedding over which one is secure
> enough. I guess we'll still have to argue about the default. 

Based on my original calculations (which sadly I don't have anymore),
the combination of SHA1, size, and file name is *extremely* unlikely to
generate a collision.  As in, unlikely to happen before the end of the
universe kind of unlikely.  Though, I guess it depends on your
expectations for the lifetime of the universe.

These checksums don't have to be cryptographically secure, in the sense
that you could infer the plaintext from the checksum.  They just need to
have a suitably low collision rate.  These days I would choose something
with more bits because the computation time is similar, though the
larger size requires more storage.

> I also
> think that it should be possible to build a manifest with no
> checksums, so that one need not pay the overhead of computing
> checksums if one does not wish. 

Our benchmarks have indicated that checksums only account for about 1%
of total cpu time when gzip -6 compression is used.  Without compression
the percentage may be higher of course, but in that case we find network
latency is the primary bottleneck.

For S3 backups we do a SHA1 hash for our manifest, a SHA256 hash for
authv4 and a good-old-fashioned MD5 checksum for each upload part.  This
is barely noticeable when compression is enabled.

> Of course, such a manifest is of much
> less utility for checking backup integrity, but you can still check
> that you've got the right files, which is noticeably better than
> nothing.  

Absolutely -- and yet.  There was a time when we made checksums optional
but eventually gave up on that once we profiled and realized how low the
cost was vs. the benefit.

> The manifest should probably also contain a checksum of its
> own contents so that the integrity of the manifest itself can be
> verified. 

This is a good idea.  Amazingly we've never seen a manifest checksum
error in the field but it's only a matter of time.

And maybe a few other bits of metadata, but I'm not sure
> exactly what.  Ideas?

A backup label for sure.  You can also use this as the directory/tar
name to save the user coming up with one.  We use YYYYMMDDHH24MMSSF for
full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
incrementals and have logic to prevent two backups from having the same
label.  This is unlikely outside of testing but still a good idea.

Knowing the start/stop time of the backup is useful in all kinds of
ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
also good.  I know this is also in backup_label but having it all in one
place is nice.

We include the version/sysid of the cluster to avoid mixups.  It's a
great extra check on top of references to be sure everything is kosher.

A manifest version is good in case we change the format later.  I'd
recommend JSON for the format since it is so ubiquitous and easily
handles escaping which can be gotchas in a home-grown format.  We
currently have a format that is a combination of Windows INI and JSON
(for human-readability in theory) and we have become painfully aware of
escaping issues.  Really, why would you drop files with '=' in their
name in PGDATA?  And yet it happens.

> Once we invent the concept of a backup manifest, what do we need to do
> with them? I think we'd want three things initially:
> 
> (1) When taking a backup, have the option (perhaps enabled by default)
> to include a backup manifest.

Manifests are cheap to builds so I wouldn't make it an option.

> (2) Given an existing backup that has not got a manifest, construct one.

Might be too late to be trusted and we'd have to write extra code for
it.  I'd leave this for a project down the road, if at all.

> (3) Cross-check a manifest against a backup and complain about extra
> files, missing files, size differences, or checksum mismatches.

Verification is the best part of the manifest.  Plus, you can do
verification pretty cheaply on restore.  We also restore pg_control last
so clusters that have a restore error won't start.

> One thing I'm not quite sure about is where to store the backup
> manifest. If you take a base backup in tar format, you get base.tar,
> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
> Does the backup manifest go into base.tar? Get written into a separate
> file outside of any tar archive? Something else? And what about a
> plain-format backup? I suppose then we should just write the manifest
> into the top level of the main data directory, but perhaps someone has
> another idea.

We do:

[backup_label]/
    backup.manifest
    pg_data/
    pg_tblspc/

In general, having the manifest easily accessible is ideal.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Robert Haas
Date:
On Wed, Sep 18, 2019 at 9:11 PM David Steele <david@pgmasters.net> wrote:
> Also consider adding the timestamp.

Sounds reasonable, even if only for the benefit of humans who might
look at the file.  We can decide later whether to use it for anything
else (and third-party tools could make different decisions from core).
I assume we're talking about file mtime here, not file ctime or file
atime or the time the manifest was generated, but let me know if I'm
wrong.

> Consider adding a reference to each file that specifies where the file
> can be found in if it is not in this backup.  As I understand the
> pg_basebackup proposal, it would only be implementing differential
> backups, i.e. an incremental that is *only* based on the last full
> backup.  So, the reference can be inferred in this case.  However, if
> the user selects the wrong full backup on restore, and we have labeled
> each backup, then a differential restore with references against the
> wrong full backup would result in a hard error rather than corruption.

I intend that we should be able to support incremental backups based
either on a previous full backup or based on a previous incremental
backup. I am not aware of a technical reason why we need to identify
the specific backup that must be used. If incremental backup B is
taken based on a pre-existing backup A, then I think that B can be
restored using either A or *any other backup taken after A and before
B*. In the normal case, there probably wouldn't be any such backup,
but AFAICS the start-LSNs are a sufficient cross-check that the chosen
base backup is legal.

> Based on my original calculations (which sadly I don't have anymore),
> the combination of SHA1, size, and file name is *extremely* unlikely to
> generate a collision.  As in, unlikely to happen before the end of the
> universe kind of unlikely.  Though, I guess it depends on your
> expectations for the lifetime of the universe.

Somebody once said that we should be prepared for it to end at an any
time, or not, and that the time at which it actually was due to end
would not be disclosed in advance. This is probably good life advice
which I ought to take more frequently than I do, but I think we can
finesse the issue for purposes of this discussion. What I'd say is: if
the probability of getting a collision is demonstrably many orders of
magnitude less than the probability of the disk writing the block
incorrectly, then I think we're probably reasonably OK. Somebody might
differ, which is perhaps a mild point in favor of LSN-based
approaches, but as a practical matter, if a bad block is a billion
times more likely to be the result of a disk error than a checksum
mismatch, then it's a negligible risk.

> And maybe a few other bits of metadata, but I'm not sure
> > exactly what.  Ideas?
>
> A backup label for sure.  You can also use this as the directory/tar
> name to save the user coming up with one.  We use YYYYMMDDHH24MMSSF for
> full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
> incrementals and have logic to prevent two backups from having the same
> label.  This is unlikely outside of testing but still a good idea.
>
> Knowing the start/stop time of the backup is useful in all kinds of
> ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
> also good.  I know this is also in backup_label but having it all in one
> place is nice.
>
> We include the version/sysid of the cluster to avoid mixups.  It's a
> great extra check on top of references to be sure everything is kosher.

I don't think it's a good idea to duplicate the information that's
already in the backup_label. Storing two copies of the same
information is just an invitation to having to worry about what
happens if they don't agree.

> A manifest version is good in case we change the format later.

Yeah.

> I'd
> recommend JSON for the format since it is so ubiquitous and easily
> handles escaping which can be gotchas in a home-grown format.  We
> currently have a format that is a combination of Windows INI and JSON
> (for human-readability in theory) and we have become painfully aware of
> escaping issues.  Really, why would you drop files with '=' in their
> name in PGDATA?  And yet it happens.

I am not crazy about JSON because it requires that I get a json parser
into src/common, which I could do, but given the possibly-imminent end
of the universe, I'm not sure it's the greatest use of time. You're
right that if we pick an ad-hoc format, we've got to worry about
escaping, which isn't lovely.

> > (1) When taking a backup, have the option (perhaps enabled by default)
> > to include a backup manifest.
>
> Manifests are cheap to builds so I wouldn't make it an option.

Huh. That's an interesting idea. Thanks.

> > (3) Cross-check a manifest against a backup and complain about extra
> > files, missing files, size differences, or checksum mismatches.
>
> Verification is the best part of the manifest.  Plus, you can do
> verification pretty cheaply on restore.  We also restore pg_control last
> so clusters that have a restore error won't start.

There's no "restore" operation here, really. A backup taken by
pg_basebackup can be "restored" by copying the whole thing, but it can
also be used just where it is. If we were going to build something
into some in-core tool to copy backups around, this would be a smart
way to implement said tool, but I'm not planning on that myself.

> > One thing I'm not quite sure about is where to store the backup
> > manifest. If you take a base backup in tar format, you get base.tar,
> > pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
> > Does the backup manifest go into base.tar? Get written into a separate
> > file outside of any tar archive? Something else? And what about a
> > plain-format backup? I suppose then we should just write the manifest
> > into the top level of the main data directory, but perhaps someone has
> > another idea.
>
> We do:
>
> [backup_label]/
>     backup.manifest
>     pg_data/
>     pg_tblspc/
>
> In general, having the manifest easily accessible is ideal.

That's a fine choice for a tool, but a I'm talking about something
that is part of the actual backup format supported by PostgreSQL, not
what a tool might wrap around it. The choice is whether, for a
tar-format backup, the manifest goes inside a tar file or as a
separate file. To put that another way, a patch adding backup
manifests does not get to redesign where pg_basebackup puts anything
else; it only gets to decide where to put the manifest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I intend that we should be able to support incremental backups based
> either on a previous full backup or based on a previous incremental
> backup. I am not aware of a technical reason why we need to identify
> the specific backup that must be used. If incremental backup B is
> taken based on a pre-existing backup A, then I think that B can be
> restored using either A or *any other backup taken after A and before
> B*. In the normal case, there probably wouldn't be any such backup,
> but AFAICS the start-LSNs are a sufficient cross-check that the chosen
> base backup is legal.

Scratch that: there can be overlapping backups, so you have to
cross-check both start and stop LSNs.

> > > (3) Cross-check a manifest against a backup and complain about extra
> > > files, missing files, size differences, or checksum mismatches.
> >
> > Verification is the best part of the manifest.  Plus, you can do
> > verification pretty cheaply on restore.  We also restore pg_control last
> > so clusters that have a restore error won't start.
>
> There's no "restore" operation here, really. A backup taken by
> pg_basebackup can be "restored" by copying the whole thing, but it can
> also be used just where it is. If we were going to build something
> into some in-core tool to copy backups around, this would be a smart
> way to implement said tool, but I'm not planning on that myself.

Scratch that: incremental backups need a restore tool, so we can use
this technique there. And it can work for full backups too, because
why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
David Steele
Date:
Hi Robert,

On 9/19/19 9:51 AM, Robert Haas wrote:
> On Wed, Sep 18, 2019 at 9:11 PM David Steele <david@pgmasters.net> wrote:
>> Also consider adding the timestamp.
> 
> Sounds reasonable, even if only for the benefit of humans who might
> look at the file.  We can decide later whether to use it for anything
> else (and third-party tools could make different decisions from core).
> I assume we're talking about file mtime here, not file ctime or file
> atime or the time the manifest was generated, but let me know if I'm
> wrong.

In my experience only mtime is useful.

>> Based on my original calculations (which sadly I don't have anymore),
>> the combination of SHA1, size, and file name is *extremely* unlikely to
>> generate a collision.  As in, unlikely to happen before the end of the
>> universe kind of unlikely.  Though, I guess it depends on your
>> expectations for the lifetime of the universe.

> What I'd say is: if
> the probability of getting a collision is demonstrably many orders of
> magnitude less than the probability of the disk writing the block
> incorrectly, then I think we're probably reasonably OK. Somebody might
> differ, which is perhaps a mild point in favor of LSN-based
> approaches, but as a practical matter, if a bad block is a billion
> times more likely to be the result of a disk error than a checksum
> mismatch, then it's a negligible risk.

Agreed.

>> We include the version/sysid of the cluster to avoid mixups.  It's a
>> great extra check on top of references to be sure everything is kosher.
> 
> I don't think it's a good idea to duplicate the information that's
> already in the backup_label. Storing two copies of the same
> information is just an invitation to having to worry about what
> happens if they don't agree.

OK, but now we have backup_label, tablespace_map, 
XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a 
backup.manifest file.  I feel like we may be drowning in backup info files.

>> I'd
>> recommend JSON for the format since it is so ubiquitous and easily
>> handles escaping which can be gotchas in a home-grown format.  We
>> currently have a format that is a combination of Windows INI and JSON
>> (for human-readability in theory) and we have become painfully aware of
>> escaping issues.  Really, why would you drop files with '=' in their
>> name in PGDATA?  And yet it happens.
> 
> I am not crazy about JSON because it requires that I get a json parser
> into src/common, which I could do, but given the possibly-imminent end
> of the universe, I'm not sure it's the greatest use of time. You're
> right that if we pick an ad-hoc format, we've got to worry about
> escaping, which isn't lovely.

My experience is that JSON is simple to implement and has already dealt 
with escaping and data structure considerations.  A home-grown solution 
will be at least as complex but have the disadvantage of being non-standard.

>>> One thing I'm not quite sure about is where to store the backup
>>> manifest. If you take a base backup in tar format, you get base.tar,
>>> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
>>> Does the backup manifest go into base.tar? Get written into a separate
>>> file outside of any tar archive? Something else? And what about a
>>> plain-format backup? I suppose then we should just write the manifest
>>> into the top level of the main data directory, but perhaps someone has
>>> another idea.
>>
>> We do:
>>
>> [backup_label]/
>>      backup.manifest
>>      pg_data/
>>      pg_tblspc/
>>
>> In general, having the manifest easily accessible is ideal.
> 
> That's a fine choice for a tool, but a I'm talking about something
> that is part of the actual backup format supported by PostgreSQL, not
> what a tool might wrap around it. The choice is whether, for a
> tar-format backup, the manifest goes inside a tar file or as a
> separate file. To put that another way, a patch adding backup
> manifests does not get to redesign where pg_basebackup puts anything
> else; it only gets to decide where to put the manifest.

Fair enough.  The point is to make the manifest easily accessible.

I'd keep it in the data directory for file-based backups and as a 
separate file for tar-based backups.  The advantage here is that we can 
pick a file name that becomes reserved which a tool can't do.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
David Steele
Date:
On 9/19/19 11:00 AM, Robert Haas wrote:

> On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> I intend that we should be able to support incremental backups based
>> either on a previous full backup or based on a previous incremental
>> backup. I am not aware of a technical reason why we need to identify
>> the specific backup that must be used. If incremental backup B is
>> taken based on a pre-existing backup A, then I think that B can be
>> restored using either A or *any other backup taken after A and before
>> B*. In the normal case, there probably wouldn't be any such backup,
>> but AFAICS the start-LSNs are a sufficient cross-check that the chosen
>> base backup is legal.
> 
> Scratch that: there can be overlapping backups, so you have to
> cross-check both start and stop LSNs.

Overall we have found it's much simpler to label each backup and 
cross-check that against the pg version and system id.  Start LSN is 
pretty unique, but backup labels work really well and are more widely 
understood.

>>>> (3) Cross-check a manifest against a backup and complain about extra
>>>> files, missing files, size differences, or checksum mismatches.
>>>
>>> Verification is the best part of the manifest.  Plus, you can do
>>> verification pretty cheaply on restore.  We also restore pg_control last
>>> so clusters that have a restore error won't start.
>>
>> There's no "restore" operation here, really. A backup taken by
>> pg_basebackup can be "restored" by copying the whole thing, but it can
>> also be used just where it is. If we were going to build something
>> into some in-core tool to copy backups around, this would be a smart
>> way to implement said tool, but I'm not planning on that myself.
> 
> Scratch that: incremental backups need a restore tool, so we can use
> this technique there. And it can work for full backups too, because
> why not?

Agreed, once we have a restore tool, use it for everything.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Michael Paquier
Date:
On Thu, Sep 19, 2019 at 11:10:46PM -0400, David Steele wrote:
> On 9/19/19 11:00 AM, Robert Haas wrote:
>> On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> > I intend that we should be able to support incremental backups based
>> > either on a previous full backup or based on a previous incremental
>> > backup. I am not aware of a technical reason why we need to identify
>> > the specific backup that must be used. If incremental backup B is
>> > taken based on a pre-existing backup A, then I think that B can be
>> > restored using either A or *any other backup taken after A and before
>> > B*. In the normal case, there probably wouldn't be any such backup,
>> > but AFAICS the start-LSNs are a sufficient cross-check that the chosen
>> > base backup is legal.
>>
>> Scratch that: there can be overlapping backups, so you have to
>> cross-check both start and stop LSNs.
>
> Overall we have found it's much simpler to label each backup and cross-check
> that against the pg version and system id.  Start LSN is pretty unique, but
> backup labels work really well and are more widely understood.

Warning.  The start LSN could be the same for multiple backups when
taken from a standby.
--
Michael

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Thu, Sep 19, 2019 at 11:10 PM David Steele <david@pgmasters.net> wrote:
> Overall we have found it's much simpler to label each backup and
> cross-check that against the pg version and system id.  Start LSN is
> pretty unique, but backup labels work really well and are more widely
> understood.

I see your point, but part of my point is that uniqueness is not a
technical requirement. However, it may be a requirement for user
comprehension.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Thu, Sep 19, 2019 at 11:06 PM David Steele <david@pgmasters.net> wrote:
> > I am not crazy about JSON because it requires that I get a json parser
> > into src/common, which I could do, but given the possibly-imminent end
> > of the universe, I'm not sure it's the greatest use of time. You're
> > right that if we pick an ad-hoc format, we've got to worry about
> > escaping, which isn't lovely.
>
> My experience is that JSON is simple to implement and has already dealt
> with escaping and data structure considerations.  A home-grown solution
> will be at least as complex but have the disadvantage of being non-standard.

I think that's fair and just spent a little while investigating how
difficult it would be to disentangle the JSON parser from the backend.
It has dependencies on the following bits of backend-only
functionality:

- check_stack_depth(). No problem, I think.  Just skip it for frontend code.

- pg_mblen() / GetDatabaseEncoding(). Not sure what to do about this.
Some of our infrastructure for dealing with encoding is available in
the frontend and backend, but this part is backend-only.

- elog() / ereport(). Kind of a pain. We could just kill the program
if an error occurs, but that seems a bit ham-fisted. Refactoring the
code so that the error is returned rather than thrown might be the way
to go, but it's not simple, because you're not just passing a string.

                        ereport(ERROR,

(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                         errmsg("invalid input syntax
for type %s", "json"),
                                         errdetail("Character with
value 0x%02x must be escaped.",
                                                           (unsigned char) *s),
                                         report_json_context(lex)));

- appendStringInfo et. al. I don't think it would be that hard to move
this to src/common, but I'm also not sure it really solves the
problem, because StringInfo has a 1GB limit, and there's no rule at
all that a backup manifest has got to be less than 1GB.

https://www.pgcon.org/2013/schedule/events/595.en.html

This gets at another problem that I just started to think about. If
the file is just a series of lines, you can parse it one line and a
time and do something with that line, then move on. If it's a JSON
blob, you have to parse the whole file and get a potentially giant
data structure back, and then operate on that data structure. At
least, I think you do. There's probably some way to create a callback
structure that lets you presuppose that the toplevel data structure is
an array (or object) and get back each element of that array (or
key/value pair) as it's parsed, but that sounds pretty annoying to get
working. Or we could just decide that you have to have enough memory
to hold the parsed version of the entire manifest file in memory all
at once, and if you don't, maybe you should drop some tables or buy
more RAM. That still leaves you with bypassing the 1GB size limit on
StringInfo, maybe by having a "huge" option, or perhaps by
memory-mapping the file and then making the StringInfo point directly
into the mapped region. Perhaps I'm overthinking this and maybe you
have a simpler idea in mind about how it can be made to work, but I
find all this complexity pretty unappealing.

Here's a competing proposal: let's decide that lines consist of
tab-separated fields. If a field contains a \t, \r, or \n, put a " at
the beginning, a " at the end, and double any " that appears in the
middle. This is easy to generate and easy to parse. It lets us
completely ignore encoding considerations. Incremental parsing is
straightforward. Quoting will rarely be needed because there's very
little reason to create a file inside a PostgreSQL data directory that
contains a tab or a newline, but if you do it'll still work.  The lack
of quoting is nice for humans reading the manifest, and nice in terms
of keeping the manifest succinct; in contrast, note that using JSON
doubles every backslash.

I hear you saying that this is going to end up being just as complex
in the end, but I don't think I believe it.  It sounds to me like the
difference between spending a couple of hours figuring this out and
spending a couple of months trying to figure it out and maybe not
actually getting anywhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Thu, Sep 19, 2019 at 11:06 PM David Steele <david@pgmasters.net> wrote:
> > I don't think it's a good idea to duplicate the information that's
> > already in the backup_label. Storing two copies of the same
> > information is just an invitation to having to worry about what
> > happens if they don't agree.
>
> OK, but now we have backup_label, tablespace_map,
> XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a
> backup.manifest file.  I feel like we may be drowning in backup info files.

I agree!

I'm not sure what to do about it, though.  The information that is
present in the tablespace_map file could have been stored in the
backup_label file, I think, and that would have made sense, because
both files are serving a very similar purpose: they tell the server
that it needs to do some non-standard stuff when it starts up, and
they give it instructions for what those things are. And, as a
secondary purpose, humans or third-party tools can read them and use
that information for whatever purpose they wish.

The proposed backup_manifest file is a little different. I don't think
that anyone is proposing that the server should read that file: it is
there solely for the purpose of helping our own tools or third-party
tools or human beings who are, uh, acting like tools.[1] We're also
proposing to put it in a different place: the backup_label goes into
one of the tar files, but the backup_manifest would sit outside of any
tar file.

If we were designing this from scratch, maybe we'd roll all of this
into one file that serves as backup manifest, tablespace map, backup
label, and backup history file, but then again, maybe separating the
instructions-to-the-server part from the backup-integrity-checking
part makes sense.  At any rate, even if we knew for sure that's the
direction we wanted to go, getting there from here looks a bit rough.
If we just add a backup manifest, people who don't care can mostly
ignore it and then should be mostly fine. If we start trying to create
the one backup information system to rule them all, we're going to
break people's tools. Maybe that's worth doing someday, but the paint
isn't even dry on removing recovery.conf yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] There are a surprising number of installations where, in effect,
the DBA is the backup-and-restore tool, performing all the steps by
hand and hoping not to mess any of them up. The fact that nearly every
PostgreSQL company offers tools to make this easier does not seem to
have done a whole lot to diminish the number of people using ad-hoc
solutions.



Re: backup manifests

From
Robert Haas
Date:
On Fri, Sep 20, 2019 at 9:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> - appendStringInfo et. al. I don't think it would be that hard to move
> this to src/common, but I'm also not sure it really solves the
> problem, because StringInfo has a 1GB limit, and there's no rule at
> all that a backup manifest has got to be less than 1GB.

Hmm.  That's actually going to be a problem on the server side, no
matter what we do on the client side.  We have to send the manifest
after we send everything else, so that we know what we sent. But if we
sent a lot of files, the manifest might be really huge. I had been
thinking that we would generate the manifest on the server and send it
to the client after everything else, but maybe this is an argument for
generating the manifest on the client side and writing it
incrementally. That would require the client to peek at the contents
of every tar file it receives all the time, which it currently doesn't
need to do, but it does peek inside them a little bit, so maybe it's
OK.

Another alternative would be to have the server spill the manifest in
progress to a temp file and then stream it from there to the client.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
David Steele
Date:
On 9/20/19 9:46 AM, Robert Haas wrote:
> On Thu, Sep 19, 2019 at 11:06 PM David Steele <david@pgmasters.net> wrote:
>
>> My experience is that JSON is simple to implement and has already dealt
>> with escaping and data structure considerations.  A home-grown solution
>> will be at least as complex but have the disadvantage of being non-standard.
>
> I think that's fair and just spent a little while investigating how
> difficult it would be to disentangle the JSON parser from the backend.
> It has dependencies on the following bits of backend-only
> functionality:

> - elog() / ereport(). Kind of a pain. We could just kill the program
> if an error occurs, but that seems a bit ham-fisted. Refactoring the
> code so that the error is returned rather than thrown might be the way
> to go, but it's not simple, because you're not just passing a string.

Seems to me we are overdue for elog()/ereport() compatible
error-handling in the front end.  Plus mem contexts.

It sucks to make that a prereq for this project but the longer we kick
that can down the road...

> https://www.pgcon.org/2013/schedule/events/595.en.html

This talk was good fun.  The largest number of tables we've seen is a
few hundred thousand, but that still adds up to more than a million
files to backup.

> This gets at another problem that I just started to think about. If
> the file is just a series of lines, you can parse it one line and a
> time and do something with that line, then move on. If it's a JSON
> blob, you have to parse the whole file and get a potentially giant
> data structure back, and then operate on that data structure. At
> least, I think you do. 

JSON can definitely be parsed incrementally, but for practical reasons
certain structures work better than others.

> There's probably some way to create a callback
> structure that lets you presuppose that the toplevel data structure is
> an array (or object) and get back each element of that array (or
> key/value pair) as it's parsed, but that sounds pretty annoying to get
> working. 

And that's how we do it.  It's annoying and yeah it's complicated but it
is very fast and memory-efficient.

> Or we could just decide that you have to have enough memory
> to hold the parsed version of the entire manifest file in memory all
> at once, and if you don't, maybe you should drop some tables or buy
> more RAM. 

I assume you meant "un-parsed" here?

> That still leaves you with bypassing the 1GB size limit on
> StringInfo, maybe by having a "huge" option, or perhaps by
> memory-mapping the file and then making the StringInfo point directly
> into the mapped region. Perhaps I'm overthinking this and maybe you
> have a simpler idea in mind about how it can be made to work, but I
> find all this complexity pretty unappealing.

Our String object has the same 1GB limit.  Partly because it works and
saves a bit of memory per object, but also because if we find ourselves
exceeding that limit we know we've probably made a design error.

Parsing in stream means that you only need to store the final in-memory
representation of the manifest which can be much more compact.  Yeah,
it's complicated, but the memory and time savings are worth it.

Note that our Perl implementation took the naive approach and has worked
pretty well for six years, but can choke on really large manifests with
out of memory errors.  Overall, I'd say getting the format right is more
important than having the perfect initial implementation.

> Here's a competing proposal: let's decide that lines consist of
> tab-separated fields. If a field contains a \t, \r, or \n, put a " at
> the beginning, a " at the end, and double any " that appears in the
> middle. This is easy to generate and easy to parse. It lets us
> completely ignore encoding considerations. Incremental parsing is
> straightforward. Quoting will rarely be needed because there's very
> little reason to create a file inside a PostgreSQL data directory that
> contains a tab or a newline, but if you do it'll still work.  The lack
> of quoting is nice for humans reading the manifest, and nice in terms
> of keeping the manifest succinct; in contrast, note that using JSON
> doubles every backslash.

There's other information you'll want to store that is not strictly file
info so you need a way to denote that.  It gets complicated quickly.

> I hear you saying that this is going to end up being just as complex
> in the end, but I don't think I believe it.  It sounds to me like the
> difference between spending a couple of hours figuring this out and
> spending a couple of months trying to figure it out and maybe not
> actually getting anywhere.

Maybe the initial implementation will be easier but I am confident we'll
pay for it down the road.  Also, don't we want users to be able to read
this file?  Do we really want them to need to cook up a custom parser in
Perl, Go, Python, etc.?

-- 
-David
david@pgmasters.net



Re: backup manifests

From
David Steele
Date:
On 9/20/19 10:59 AM, Robert Haas wrote:
> On Fri, Sep 20, 2019 at 9:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> - appendStringInfo et. al. I don't think it would be that hard to move
>> this to src/common, but I'm also not sure it really solves the
>> problem, because StringInfo has a 1GB limit, and there's no rule at
>> all that a backup manifest has got to be less than 1GB.
> 
> Hmm.  That's actually going to be a problem on the server side, no
> matter what we do on the client side.  We have to send the manifest
> after we send everything else, so that we know what we sent. But if we
> sent a lot of files, the manifest might be really huge. I had been
> thinking that we would generate the manifest on the server and send it
> to the client after everything else, but maybe this is an argument for
> generating the manifest on the client side and writing it
> incrementally. That would require the client to peek at the contents
> of every tar file it receives all the time, which it currently doesn't
> need to do, but it does peek inside them a little bit, so maybe it's
> OK.
> 
> Another alternative would be to have the server spill the manifest in
> progress to a temp file and then stream it from there to the client.

This seems reasonable to me.

We keep an in-memory representation which is just an array of structs
and is fairly compact -- 1 million files uses ~150MB of memory.  We just
format and stream this to storage when saving.  Saving is easier than
loading, of course.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Chapman Flack
Date:
On 9/20/19 9:46 AM, Robert Haas wrote:

> least, I think you do. There's probably some way to create a callback
> structure that lets you presuppose that the toplevel data structure is
> an array (or object) and get back each element of that array (or
> key/value pair) as it's parsed,

If a JSON parser does find its way into src/common, it probably wants
to have such an incremental mode available, similar to [2] offered
in the "Jackson" library for Java.

The Jackson developer has propounded a thesis[1] that such a parsing
library ought to offer "Three -- and Only Three" different styles of
API corresponding to three ways of organizing the code using the
library ([2], [3], [4], which also resemble the different APIs
supplied in Java for XML processing).

Regards,
-Chap


[1] http://www.cowtowncoder.com/blog/archives/2009/01/entry_132.html
[2] http://www.cowtowncoder.com/blog/archives/2009/01/entry_137.html
[3] http://www.cowtowncoder.com/blog/archives/2009/01/entry_153.html
[4] http://www.cowtowncoder.com/blog/archives/2009/01/entry_152.html



Re: backup manifests

From
Robert Haas
Date:
On Fri, Sep 20, 2019 at 11:09 AM David Steele <david@pgmasters.net> wrote:
> Seems to me we are overdue for elog()/ereport() compatible
> error-handling in the front end.  Plus mem contexts.
>
> It sucks to make that a prereq for this project but the longer we kick
> that can down the road...

There are no doubt many patches that would benefit from having more
backend infrastructure exposed in frontend contexts, and I think we're
slowly moving in that direction, but I generally do not believe in
burdening feature patches with major infrastructure improvements.
Sometimes it's necessary, as in the case of parallel query, which
required upgrading a whole lot of backend infrastructure in order to
have any chance of doing something useful. In most cases, however,
there's a way of getting the patch done that dodges the problem.

For example, I think there's a pretty good argument that Heikki's
design for relation forks was a bad one. It's proven to scale poorly
and create performance problems and extra complexity in quite a few
places. It would likely have been better, from a strictly theoretical
point of view, to insist on a design where the FSM and VM pages got
stored inside the relation itself, and the heap was responsible for
figuring out how various pages were being used. When BRIN came along,
we insisted on precisely that design, because it was clear that
further straining the relation fork system was not a good plan.
However, if we'd insisted on that when Heikki did the original work,
it might have delayed the arrival of the free space map for one or
more releases, and we got big benefits out of having that done sooner.
There's nothing stopping someone from writing a patch to get rid of
relation forks and allow a heap AM to have multiple relfilenodes (with
the extra ones used for the FSM and VM) or with multiplexing all the
data inside of a single file. Nobody has, though, because it's hard,
and the problems with the status quo are not so bad as to justify the
amount of development effort that would be required to fix it. At some
point, that problem is probably going to work its way to the top of
somebody's priority list, but it's already been about 10 years since
that all happened and everyone has so far dodged dealing with the
problem, which in turn has enabled them to work on other things that
are perhaps more important.

I think the same principle applies here. It's reasonable to ask the
author of a feature patch to fix issues that are closely related to
the feature in question, or even problems that are not new but would
be greatly exacerbated by the addition of the feature. It's not
reasonable to stack up a list of infrastructure upgrades that somebody
has to do as a condition of having a feature patch accepted that does
not necessarily require those upgrades. I am not convinced that JSON
is actually a better format for a backup manifest (more on that
below), but even if I were, I believe that getting a backup manifest
functionality into PostgreSQL 13, and perhaps incremental backup on
top of that, is valuable enough to justify making some compromises to
make that happen. And I don't mean "compromises" as in "let's commit
something that doesn't work very well;" rather, I mean making design
choices that are aimed at making the project something that is
feasible and can be completed in reasonable time, rather than not.

And saying, well, the backup manifest format *has* to be JSON because
everything else suxxor is not that. We don't have a single other
example of a file that we read and write in JSON format. Extension
control files use a custom format. Backup labels and backup history
files and timeline history files and tablespace map files use custom
formats. postgresql.conf, pg_hba.conf, and pg_ident.conf use custom
formats. postmaster.opts and postmaster.pid use custom formats. If
JSON is better and easier, at least one of the various people who
coded those things up would have chosen to use it, but none of them
did, and nobody's made a serious attempt to convert them to use it.
That might be because we lack the infrastructure for dealing with JSON
and building it is more work than anybody's willing to do, or it might
be because JSON is not actually better for these kinds of use cases,
but either way, it's hard to see why this particular patch should be
burdened with a requirement that none of the previous ones had to
satisfy.

Personally, I'd be intensely unhappy if a motion to convert
postgresql.conf or pg_hba.conf to JSON format gathered enough steam to
be adopted.  It would be darn useful, because you could specify
complex values for options instead of being limited to scalars, but it
would also make the configuration files a lot harder for human beings
to read and grep and the quality of error reporting would probably
decline significantly.  Also, appending a setting to the file,
something which is currently quite simple, would get a lot harder.
Ad-hoc file formats can be problematic, but they can also have real
advantages in terms of readability, brevity, and fitness for purpose.

> This talk was good fun.  The largest number of tables we've seen is a
> few hundred thousand, but that still adds up to more than a million
> files to backup.

A quick survey of some of my colleagues turned up a few examples of
people with 2-4 million files to backup, so similar kind of ballpark.
Probably not big enough for the manifest to hit the 1GB mark, but
getting close.

> > Or we could just decide that you have to have enough memory
> > to hold the parsed version of the entire manifest file in memory all
> > at once, and if you don't, maybe you should drop some tables or buy
> > more RAM.
>
> I assume you meant "un-parsed" here?

I don't think I meant that, although it seems like you might need to
store either all the parsed data or all the unparsed data or even
both, depending on exactly what you are trying to do.

> > I hear you saying that this is going to end up being just as complex
> > in the end, but I don't think I believe it.  It sounds to me like the
> > difference between spending a couple of hours figuring this out and
> > spending a couple of months trying to figure it out and maybe not
> > actually getting anywhere.
>
> Maybe the initial implementation will be easier but I am confident we'll
> pay for it down the road.  Also, don't we want users to be able to read
> this file?  Do we really want them to need to cook up a custom parser in
> Perl, Go, Python, etc.?

Well, I haven't heard anybody complain that they can't read a
backup_label file because it's too hard to cook up a parser.  And I
think the reason is pretty clear: such files are not hard to parse.
Similarly for a pg_hba.conf file.  This case is a little more
complicated than those, but AFAICS, not enormously so. Actually, it
seems like a combination of those two cases: it has some fixed
metadata fields that can be represented with one line per field, like
a backup_label, and then a bunch of entries for files that are
somewhat like entries in a pg_hba.conf file, in that they can be
represented by a line per record with a certain number of fields on
each line.

I attach here a couple of patches.  The first one does some
refactoring of relevant code in pg_basebackup, and the second one adds
checksum manifests using a format that I pulled out of my ear. It
probably needs some adjustment but I don't think it's crazy.  Each
file gets a line that looks like this:

File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM

Right now, the file checksums are computed using SHA-256 but it could
be changed to anything else for which we've got code. On my system,
shasum -a256 $FILE produces the same answer that shows up here.  At
the bottom of the manifest there's a checksum of the manifest itself,
which looks like this:

Manifest-Checksum
385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781

That's a SHA-256 checksum of the file contents excluding the final
line. It can be verified by feeding all the file contents except the
last line to shasum -a256. I can't help but observe that if the file
were defined to be a JSONB blob, it's not very clear how you would
include a checksum of the blob contents in the blob itself, but with a
format based on a bunch of lines of data, it's super-easy to generate
and super-easy to write tools that verify it.

This is just a prototype so I haven't written a verification tool, and
there's a bunch of testing and documentation and so forth that would
need to be done aside from whatever we've got to hammer out in terms
of design issues and file formats.  But I think it's cool, and perhaps
some discussion of how it could be evolved will get us closer to a
resolution everybody can at least live with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
David Steele
Date:
On 9/20/19 2:55 PM, Robert Haas wrote:
> On Fri, Sep 20, 2019 at 11:09 AM David Steele <david@pgmasters.net> wrote:
>>
>> It sucks to make that a prereq for this project but the longer we kick
>> that can down the road...
> 
> There are no doubt many patches that would benefit from having more
> backend infrastructure exposed in frontend contexts, and I think we're
> slowly moving in that direction, but I generally do not believe in
> burdening feature patches with major infrastructure improvements.

The hardest part about technical debt is knowing when to incur it.  It
is never a cut-and-dried choice.

>> This talk was good fun.  The largest number of tables we've seen is a
>> few hundred thousand, but that still adds up to more than a million
>> files to backup.
> 
> A quick survey of some of my colleagues turned up a few examples of
> people with 2-4 million files to backup, so similar kind of ballpark.
> Probably not big enough for the manifest to hit the 1GB mark, but
> getting close.

I have so many doubts about clusters with this many tables, but we do
support it, so...

>>> I hear you saying that this is going to end up being just as complex
>>> in the end, but I don't think I believe it.  It sounds to me like the
>>> difference between spending a couple of hours figuring this out and
>>> spending a couple of months trying to figure it out and maybe not
>>> actually getting anywhere.
>>
>> Maybe the initial implementation will be easier but I am confident we'll
>> pay for it down the road.  Also, don't we want users to be able to read
>> this file?  Do we really want them to need to cook up a custom parser in
>> Perl, Go, Python, etc.?
> 
> Well, I haven't heard anybody complain that they can't read a
> backup_label file because it's too hard to cook up a parser.  And I
> think the reason is pretty clear: such files are not hard to parse.
> Similarly for a pg_hba.conf file.  This case is a little more
> complicated than those, but AFAICS, not enormously so. Actually, it
> seems like a combination of those two cases: it has some fixed
> metadata fields that can be represented with one line per field, like
> a backup_label, and then a bunch of entries for files that are
> somewhat like entries in a pg_hba.conf file, in that they can be
> represented by a line per record with a certain number of fields on
> each line.

Yeah, they are not hard to parse, but *everyone* has to cook up code for
it.  A bit of a bummer, that.

> I attach here a couple of patches.  The first one does some
> refactoring of relevant code in pg_basebackup, and the second one adds
> checksum manifests using a format that I pulled out of my ear. It
> probably needs some adjustment but I don't think it's crazy.  Each
> file gets a line that looks like this:
> 
> File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM

We also include page checksum validation failures in the file record.
Not critical for the first pass, perhaps, but something to keep in mind.

> Right now, the file checksums are computed using SHA-256 but it could
> be changed to anything else for which we've got code. On my system,
> shasum -a256 $FILE produces the same answer that shows up here.  At
> the bottom of the manifest there's a checksum of the manifest itself,
> which looks like this:
> 
> Manifest-Checksum
> 385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781
> 
> That's a SHA-256 checksum of the file contents excluding the final
> line. It can be verified by feeding all the file contents except the
> last line to shasum -a256. I can't help but observe that if the file
> were defined to be a JSONB blob, it's not very clear how you would
> include a checksum of the blob contents in the blob itself, but with a
> format based on a bunch of lines of data, it's super-easy to generate
> and super-easy to write tools that verify it.

You can do this in JSON pretty easily by handling the terminating
brace/bracket:

{
<some json contents>*,
"checksum":<sha256>
}

But of course a linefeed-delimited file is even easier.

> This is just a prototype so I haven't written a verification tool, and
> there's a bunch of testing and documentation and so forth that would
> need to be done aside from whatever we've got to hammer out in terms
> of design issues and file formats.  But I think it's cool, and perhaps
> some discussion of how it could be evolved will get us closer to a
> resolution everybody can at least live with.

I had a quick look and it seems pretty reasonable.  I'll need to
generate a manifest to see if I can spot any obvious gotchas.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
vignesh C
Date:
On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
Some comments:

Manifest file will be in plain text format even if compression is
specified, should we compress it?
May be this is intended, just raised the point to make sure that it is intended.
+static void
+ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
+{
+ WriteManifestState *state = callback_data;
+
+ if (fwrite(copybuf, r, 1, state->file) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", state->filename);
+ exit(1);
+ }
+}

WALfile.done file gets added but wal file information is not included
in the manifest file, should we include WAL file also?
@@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
  (errcode_for_file_access(),
  errmsg("could not stat file \"%s\": %m", pathbuf)));

- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
+ NULL);

  /* unconditionally mark file as archived */
  StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
+ sendFileWithContent(pathbuf, "", manifest);

Should we add an option to make manifest generation configurable to
reduce overhead during backup?

Manifest file does not include directory information, should we include it?

There is one warning:
In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
                 from pg_basebackup.c:34:
pg_basebackup.c: In function ‘ReceiveTarFile’:
../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
comparison will always evaluate as ‘false’ for the address of ‘buf’
will never be NULL [-Waddress]
  ((str) == NULL || (str)->maxlen == 0)
         ^
pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
   if (PQExpBufferBroken(&buf))

pg_gmtime can fail in case of malloc failure:
+ /*
+ * Convert time to a string. Since it's not clear what time zone to use
+ * and since time zone definitions can change, possibly causing confusion,
+ * use GMT always.
+ */
+ pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
+ pg_gmtime(&mtime));

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: backup manifests

From
Jeevan Chalke
Date:

Entry for directory is not added in manifest. So it might be difficult
at client to get to know about the directories. Will it be good to add
an entry for each directory too? May be like:
Dir    <dirname> <mtime>

Also, on latest HEAD patches does not apply.

On Wed, Sep 25, 2019 at 6:17 PM vignesh C <vignesh21@gmail.com> wrote:
On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
Some comments:

Manifest file will be in plain text format even if compression is
specified, should we compress it?
May be this is intended, just raised the point to make sure that it is intended.
+static void
+ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
+{
+ WriteManifestState *state = callback_data;
+
+ if (fwrite(copybuf, r, 1, state->file) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", state->filename);
+ exit(1);
+ }
+}

WALfile.done file gets added but wal file information is not included
in the manifest file, should we include WAL file also?
@@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
  (errcode_for_file_access(),
  errmsg("could not stat file \"%s\": %m", pathbuf)));

- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
+ NULL);

  /* unconditionally mark file as archived */
  StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
+ sendFileWithContent(pathbuf, "", manifest);

Should we add an option to make manifest generation configurable to
reduce overhead during backup?

Manifest file does not include directory information, should we include it?

There is one warning:
In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
                 from pg_basebackup.c:34:
pg_basebackup.c: In function ‘ReceiveTarFile’:
../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
comparison will always evaluate as ‘false’ for the address of ‘buf’
will never be NULL [-Waddress]
  ((str) == NULL || (str)->maxlen == 0)
         ^
pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
   if (PQExpBufferBroken(&buf))


Yes I too obeserved this warning.
 
pg_gmtime can fail in case of malloc failure:
+ /*
+ * Convert time to a string. Since it's not clear what time zone to use
+ * and since time zone definitions can change, possibly causing confusion,
+ * use GMT always.
+ */
+ pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
+ pg_gmtime(&mtime));

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: backup manifests

From
Rushabh Lathia
Date:


On Wed, Sep 25, 2019 at 6:17 PM vignesh C <vignesh21@gmail.com> wrote:
On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
Some comments:

Manifest file will be in plain text format even if compression is
specified, should we compress it?
May be this is intended, just raised the point to make sure that it is intended.
+static void
+ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
+{
+ WriteManifestState *state = callback_data;
+
+ if (fwrite(copybuf, r, 1, state->file) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", state->filename);
+ exit(1);
+ }
+}

WALfile.done file gets added but wal file information is not included
in the manifest file, should we include WAL file also?
@@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
  (errcode_for_file_access(),
  errmsg("could not stat file \"%s\": %m", pathbuf)));

- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
+ NULL);

  /* unconditionally mark file as archived */
  StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
+ sendFileWithContent(pathbuf, "", manifest);

Should we add an option to make manifest generation configurable to
reduce overhead during backup?

Manifest file does not include directory information, should we include it?

There is one warning:
In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
                 from pg_basebackup.c:34:
pg_basebackup.c: In function ‘ReceiveTarFile’:
../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
comparison will always evaluate as ‘false’ for the address of ‘buf’
will never be NULL [-Waddress]
  ((str) == NULL || (str)->maxlen == 0)
         ^
pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
   if (PQExpBufferBroken(&buf))


I also observed this warning.  PFA to fix the same.

pg_gmtime can fail in case of malloc failure:
+ /*
+ * Convert time to a string. Since it's not clear what time zone to use
+ * and since time zone definitions can change, possibly causing confusion,
+ * use GMT always.
+ */
+ pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
+ pg_gmtime(&mtime));


Fixed that into attached patch.




Regards.
Rushabh Lathia
Attachment

Re: backup manifests

From
Robert Haas
Date:
On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> Entry for directory is not added in manifest. So it might be difficult
> at client to get to know about the directories. Will it be good to add
> an entry for each directory too? May be like:
> Dir    <dirname> <mtime>

Well, what kind of corruption would this allow us to detect that we
can't detect as things stand? I think the only case is an empty
directory. If it's not empty, we'd have some entries for the files in
that directory, and those files won't be able to exist unless the
directory does. But, how would we end up backing up an empty
directory, anyway?

I don't really *mind* adding directories into the manifest, but I'm
not sure how much it helps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Rushabh Lathia
Date:


My colleague Suraj did testing and noticed the performance impact
with the checksums.   On further testing, he found that specifically with
sha its more of performance impact.  

Please find below statistics:

no of tableswithout checksumSHA256
checksum
% performnce
overhead
with
SHA-256
md5 checksum% performnce
overhead with md5
CRC checksum% performnce
overhead with
CRC
10 (100 MB
in each table)
real 0m10.957s
user 0m0.367s
sys 0m2.275s
real 0m16.816s
user 0m0.210s
sys 0m2.067s
53%real 0m11.895s
user 0m0.174s
sys 0m1.725s
8%real 0m11.136s
user 0m0.365s
sys 0m2.298s
2%
20 (100 MB
in each table)
real 0m20.610s
user 0m0.484s
sys 0m3.198s
real 0m31.745s
user 0m0.569s
sys 0m4.089s
54%real 0m22.717s
user 0m0.638s
sys 0m4.026s
10%real 0m21.075s
user 0m0.538s
sys 0m3.417s
2%
50 (100 MB
in each table)
real 0m49.143s
user 0m1.646s
sys 0m8.499s
real 1m13.683s
user 0m1.305s
sys 0m10.541s
50%real 0m51.856s
user 0m0.932s
sys 0m7.702s
6%real 0m49.689s
user 0m1.028s
sys 0m6.921s
1%
100 (100 MB
in each table)
real 1m34.308s
user 0m2.265s
sys 0m14.717s
real 2m22.403s
user 0m2.613s
sys 0m20.776s
51%real 1m41.524s
user 0m2.158s
sys 0m15.949s
8%real 1m35.045s
user 0m2.061s
sys 0m16.308s
1%
100 (1 GB
in each table)
real 17m18.336s
user 0m20.222s
sys 3m12.960s
real 24m45.942s
user 0m26.911s
sys 3m33.501s
43%real 17m41.670s
user 0m26.506s
sys 3m18.402s
2%real 17m22.296s
user 0m26.811s
sys 3m56.653s

sometimes, this test
completes within the
same time as without
checksum.
approx. 0.5%


Considering the above results, I modified the earlier Robert's patch and added
"manifest_with_checksums" option to pg_basebackup.  With a new patch.
by default, checksums will be disabled and will be only enabled when
"manifest_with_checksums" option is provided.  Also re-based all patch set.



Regards,

--
Rushabh Lathia

On Tue, Oct 1, 2019 at 5:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> Entry for directory is not added in manifest. So it might be difficult
> at client to get to know about the directories. Will it be good to add
> an entry for each directory too? May be like:
> Dir    <dirname> <mtime>

Well, what kind of corruption would this allow us to detect that we
can't detect as things stand? I think the only case is an empty
directory. If it's not empty, we'd have some entries for the files in
that directory, and those files won't be able to exist unless the
directory does. But, how would we end up backing up an empty
directory, anyway?

I don't really *mind* adding directories into the manifest, but I'm
not sure how much it helps.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




--
Rushabh Lathia
Attachment

Re: backup manifests

From
Andrew Dunstan
Date:
On 11/19/19 5:00 AM, Rushabh Lathia wrote:
>
>
> My colleague Suraj did testing and noticed the performance impact
> with the checksums.   On further testing, he found that specifically with
> sha its more of performance impact.  
>
>

I admit I haven't been following along closely, but why do we need a
cryptographic checksum here instead of, say, a CRC? Do we think that
somehow the checksum might be forged? Use of cryptographic hashes as
general purpose checksums has become far too common IMNSHO.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backup manifests

From
David Steele
Date:
On 11/19/19 5:00 AM, Rushabh Lathia wrote:
> 
> My colleague Suraj did testing and noticed the performance impact
> with the checksums.   On further testing, he found that specifically with
> sha its more of performance impact.  

We have found that SHA1 adds about 3% overhead when the backup is also
compressed (gzip -6), which is what most people want to do.  This
percentage goes down even more if the backup is being transferred over a
network or to an object store such as S3.

We judged that the lower collision rate of SHA1 justified the additional
expense.

That said, making SHA256 optional seems reasonable.  We decided not to
make our SHA1 checksums optional to reduce the test matrix and because
parallelism largely addressed performance concerns.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Rushabh Lathia
Date:


On Tue, Nov 19, 2019 at 7:19 PM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

On 11/19/19 5:00 AM, Rushabh Lathia wrote:
>
>
> My colleague Suraj did testing and noticed the performance impact
> with the checksums.   On further testing, he found that specifically with
> sha its more of performance impact.  
>
>

I admit I haven't been following along closely, but why do we need a
cryptographic checksum here instead of, say, a CRC? Do we think that
somehow the checksum might be forged? Use of cryptographic hashes as
general purpose checksums has become far too common IMNSHO.

Yeah, maybe.  I was thinking to give the user an option to choose checksums
algorithms (SHA256. CRC, MD5, etc),  so that they are open to choose what
suites for their environment.

If we decide to do that than we need  to store the checksums algorithm
information in the manifest file.

Thoughts?




cheers


andrew


--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Rushabh Lathia

Re: backup manifests

From
Suraj Kharage
Date:
Hi,

Since now we are generating the backup manifest file with each backup, it provides us an option to validate the given backup.  
Let's say, we have taken a backup and after a few days, we want to check whether that backup is validated or corruption-free without restarting the server.

Please find attached POC patch for same which will be based on the latest backup manifest patch from Rushabh. With this functionality, we add new option to pg_basebackup, something like --verify-backup.
So, the syntax would be:
./bin/pg_basebackup --verify-backup -D <backup_directory_path>

Basically, we read the backup_manifest file line by line from the given directory path and build the hash table, then scan the directory and compare each file with the hash entry.

Thoughts/suggestions?

On Tue, Nov 19, 2019 at 3:30 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


My colleague Suraj did testing and noticed the performance impact
with the checksums.   On further testing, he found that specifically with
sha its more of performance impact.  

Please find below statistics:

no of tableswithout checksumSHA256
checksum
% performnce
overhead
with
SHA-256
md5 checksum% performnce
overhead with md5
CRC checksum% performnce
overhead with
CRC
10 (100 MB
in each table)
real 0m10.957s
user 0m0.367s
sys 0m2.275s
real 0m16.816s
user 0m0.210s
sys 0m2.067s
53%real 0m11.895s
user 0m0.174s
sys 0m1.725s
8%real 0m11.136s
user 0m0.365s
sys 0m2.298s
2%
20 (100 MB
in each table)
real 0m20.610s
user 0m0.484s
sys 0m3.198s
real 0m31.745s
user 0m0.569s
sys 0m4.089s
54%real 0m22.717s
user 0m0.638s
sys 0m4.026s
10%real 0m21.075s
user 0m0.538s
sys 0m3.417s
2%
50 (100 MB
in each table)
real 0m49.143s
user 0m1.646s
sys 0m8.499s
real 1m13.683s
user 0m1.305s
sys 0m10.541s
50%real 0m51.856s
user 0m0.932s
sys 0m7.702s
6%real 0m49.689s
user 0m1.028s
sys 0m6.921s
1%
100 (100 MB
in each table)
real 1m34.308s
user 0m2.265s
sys 0m14.717s
real 2m22.403s
user 0m2.613s
sys 0m20.776s
51%real 1m41.524s
user 0m2.158s
sys 0m15.949s
8%real 1m35.045s
user 0m2.061s
sys 0m16.308s
1%
100 (1 GB
in each table)
real 17m18.336s
user 0m20.222s
sys 3m12.960s
real 24m45.942s
user 0m26.911s
sys 3m33.501s
43%real 17m41.670s
user 0m26.506s
sys 3m18.402s
2%real 17m22.296s
user 0m26.811s
sys 3m56.653s

sometimes, this test
completes within the
same time as without
checksum.
approx. 0.5%


Considering the above results, I modified the earlier Robert's patch and added
"manifest_with_checksums" option to pg_basebackup.  With a new patch.
by default, checksums will be disabled and will be only enabled when
"manifest_with_checksums" option is provided.  Also re-based all patch set.



Regards,

--
Rushabh Lathia

On Tue, Oct 1, 2019 at 5:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> Entry for directory is not added in manifest. So it might be difficult
> at client to get to know about the directories. Will it be good to add
> an entry for each directory too? May be like:
> Dir    <dirname> <mtime>

Well, what kind of corruption would this allow us to detect that we
can't detect as things stand? I think the only case is an empty
directory. If it's not empty, we'd have some entries for the files in
that directory, and those files won't be able to exist unless the
directory does. But, how would we end up backing up an empty
directory, anyway?

I don't really *mind* adding directories into the manifest, but I'm
not sure how much it helps.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




--
Rushabh Lathia


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Jeevan Chalke
Date:


On Tue, Nov 19, 2019 at 3:30 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


My colleague Suraj did testing and noticed the performance impact
with the checksums.   On further testing, he found that specifically with
sha its more of performance impact.  

Please find below statistics:

no of tableswithout checksumSHA256
checksum
% performnce
overhead
with
SHA-256
md5 checksum% performnce
overhead with md5
CRC checksum% performnce
overhead with
CRC
10 (100 MB
in each table)
real 0m10.957s
user 0m0.367s
sys 0m2.275s
real 0m16.816s
user 0m0.210s
sys 0m2.067s
53%real 0m11.895s
user 0m0.174s
sys 0m1.725s
8%real 0m11.136s
user 0m0.365s
sys 0m2.298s
2%
20 (100 MB
in each table)
real 0m20.610s
user 0m0.484s
sys 0m3.198s
real 0m31.745s
user 0m0.569s
sys 0m4.089s
54%real 0m22.717s
user 0m0.638s
sys 0m4.026s
10%real 0m21.075s
user 0m0.538s
sys 0m3.417s
2%
50 (100 MB
in each table)
real 0m49.143s
user 0m1.646s
sys 0m8.499s
real 1m13.683s
user 0m1.305s
sys 0m10.541s
50%real 0m51.856s
user 0m0.932s
sys 0m7.702s
6%real 0m49.689s
user 0m1.028s
sys 0m6.921s
1%
100 (100 MB
in each table)
real 1m34.308s
user 0m2.265s
sys 0m14.717s
real 2m22.403s
user 0m2.613s
sys 0m20.776s
51%real 1m41.524s
user 0m2.158s
sys 0m15.949s
8%real 1m35.045s
user 0m2.061s
sys 0m16.308s
1%
100 (1 GB
in each table)
real 17m18.336s
user 0m20.222s
sys 3m12.960s
real 24m45.942s
user 0m26.911s
sys 3m33.501s
43%real 17m41.670s
user 0m26.506s
sys 3m18.402s
2%real 17m22.296s
user 0m26.811s
sys 3m56.653s

sometimes, this test
completes within the
same time as without
checksum.
approx. 0.5%


Considering the above results, I modified the earlier Robert's patch and added
"manifest_with_checksums" option to pg_basebackup.  With a new patch.
by default, checksums will be disabled and will be only enabled when
"manifest_with_checksums" option is provided.  Also re-based all patch set.

Review comments on 0004:

1.
I don't think we need o_manifest_with_checksums variable,
manifest_with_checksums can be used instead.

2.
We need to document this new option for pg_basebackup and basebackup.

3.
Also, instead of keeping manifest_with_checksums as a global variable, we
should pass that to the required function. Patch 0002 already modified the
signature of all relevant functions anyways. So just need to add one more bool
variable there.

4.
Why we need a "File" at the start of each entry as we are adding files only?
I wonder if we also need to provide a tablespace name and directory marker so
that we have "Tablespace" and "Dir" at the start.

5.
If I don't provide manifest-with-checksums option then too I see that checksum
is calculated for backup_manifest file itself. Is that intentional or missed?
I think we should omit that too if this option is not provided.

6.
Is it possible to get only a backup manifest from the server? A client like
pg_basebackup can then use that to fetch files reading that.

Thanks
 



Regards,

--
Rushabh Lathia

On Tue, Oct 1, 2019 at 5:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> Entry for directory is not added in manifest. So it might be difficult
> at client to get to know about the directories. Will it be good to add
> an entry for each directory too? May be like:
> Dir    <dirname> <mtime>

Well, what kind of corruption would this allow us to detect that we
can't detect as things stand? I think the only case is an empty
directory. If it's not empty, we'd have some entries for the files in
that directory, and those files won't be able to exist unless the
directory does. But, how would we end up backing up an empty
directory, anyway?

I don't really *mind* adding directories into the manifest, but I'm
not sure how much it helps.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




--
Rushabh Lathia


--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: backup manifests

From
Jeevan Chalke
Date:


On Wed, Nov 20, 2019 at 11:05 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
Hi,

Since now we are generating the backup manifest file with each backup, it provides us an option to validate the given backup.  
Let's say, we have taken a backup and after a few days, we want to check whether that backup is validated or corruption-free without restarting the server.

Please find attached POC patch for same which will be based on the latest backup manifest patch from Rushabh. With this functionality, we add new option to pg_basebackup, something like --verify-backup.
So, the syntax would be:
./bin/pg_basebackup --verify-backup -D <backup_directory_path>

Basically, we read the backup_manifest file line by line from the given directory path and build the hash table, then scan the directory and compare each file with the hash entry.

Thoughts/suggestions?


I like the idea of verifying the backup once we have backup_manifest with us.
Periodically verifying the already taken backup with this simple tool becomes
easy now.

I have reviewed this patch and here are my comments:

1.
@@ -30,7 +30,9 @@
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/sha2.h"
 #include "common/string.h"
+#include "fe_utils/simple_list.h"
 #include "fe_utils/recovery_gen.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
@@ -38,12 +40,19 @@
 #include "pgtar.h"
 #include "pgtime.h"
 #include "pqexpbuffer.h"
+#include "pgrhash.h"
 #include "receivelog.h"
 #include "replication/basebackup.h"
 #include "streamutil.h"


Please add new files in order.

2.
Can hash related file names be renamed to backuphash.c and backuphash.h?

3.
Need indentation adjustments at various places.

4.
+            char        buf[1000000];  // 1MB chunk

It will be good if we have multiple of block /page size (or at-least power of 2
number).

5.
+typedef struct pgrhash_entry
+{
+    struct pgrhash_entry *next; /* link to next entry in same bucket */
+    DataDirectoryFileInfo *record;
+} pgrhash_entry;
+
+struct pgrhash
+{
+    unsigned    nbuckets;        /* number of buckets */
+    pgrhash_entry **bucket;        /* pointer to hash entries */
+};
+

+typedef struct pgrhash pgrhash;

These two can be moved to .h file instead of redefining over there.

6.
+/*
+ * TODO: this function is not necessary, can be removed.
+ * Test whether the given row number is match for the supplied keys.
+ */
+static bool
+pgrhash_compare(char *bt_filename, char *filename)

Yeah, it can be removed by doing strcmp() at the required places rather than
doing it in a separate function.

7.
mdate is not compared anywhere. I understand that it can't be compared with
the file in the backup directory and its entry in the manifest as manifest
entry gives mtime from server file whereas the same file in the backup will
have different mtime. But adding a few comments there will be good.

8.
+    char        mdate[24];

should be mtime instead?


Thanks

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: backup manifests

From
Rushabh Lathia
Date:
Thank you Jeevan for reviewing the patch.

On Thu, Nov 21, 2019 at 2:33 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:


On Tue, Nov 19, 2019 at 3:30 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


My colleague Suraj did testing and noticed the performance impact
with the checksums.   On further testing, he found that specifically with
sha its more of performance impact.  

Please find below statistics:

no of tableswithout checksumSHA256
checksum
% performnce
overhead
with
SHA-256
md5 checksum% performnce
overhead with md5
CRC checksum% performnce
overhead with
CRC
10 (100 MB
in each table)
real 0m10.957s
user 0m0.367s
sys 0m2.275s
real 0m16.816s
user 0m0.210s
sys 0m2.067s
53%real 0m11.895s
user 0m0.174s
sys 0m1.725s
8%real 0m11.136s
user 0m0.365s
sys 0m2.298s
2%
20 (100 MB
in each table)
real 0m20.610s
user 0m0.484s
sys 0m3.198s
real 0m31.745s
user 0m0.569s
sys 0m4.089s
54%real 0m22.717s
user 0m0.638s
sys 0m4.026s
10%real 0m21.075s
user 0m0.538s
sys 0m3.417s
2%
50 (100 MB
in each table)
real 0m49.143s
user 0m1.646s
sys 0m8.499s
real 1m13.683s
user 0m1.305s
sys 0m10.541s
50%real 0m51.856s
user 0m0.932s
sys 0m7.702s
6%real 0m49.689s
user 0m1.028s
sys 0m6.921s
1%
100 (100 MB
in each table)
real 1m34.308s
user 0m2.265s
sys 0m14.717s
real 2m22.403s
user 0m2.613s
sys 0m20.776s
51%real 1m41.524s
user 0m2.158s
sys 0m15.949s
8%real 1m35.045s
user 0m2.061s
sys 0m16.308s
1%
100 (1 GB
in each table)
real 17m18.336s
user 0m20.222s
sys 3m12.960s
real 24m45.942s
user 0m26.911s
sys 3m33.501s
43%real 17m41.670s
user 0m26.506s
sys 3m18.402s
2%real 17m22.296s
user 0m26.811s
sys 3m56.653s

sometimes, this test
completes within the
same time as without
checksum.
approx. 0.5%


Considering the above results, I modified the earlier Robert's patch and added
"manifest_with_checksums" option to pg_basebackup.  With a new patch.
by default, checksums will be disabled and will be only enabled when
"manifest_with_checksums" option is provided.  Also re-based all patch set.

Review comments on 0004:

1.
I don't think we need o_manifest_with_checksums variable,
manifest_with_checksums can be used instead.

Yes, done in the latest version of opatch.


2.
We need to document this new option for pg_basebackup and basebackup.


Done, attaching documentation patch with the mail.

3.
Also, instead of keeping manifest_with_checksums as a global variable, we
should pass that to the required function. Patch 0002 already modified the
signature of all relevant functions anyways. So just need to add one more bool
variable there.


yes, earlier I did that implementation but later found that we already
have checksum related global variable i.e. noverify_checksums, so
that it will be clean implementation - rather modifying the function definition
to pass the variable (which is actually global for the operation).

4.
Why we need a "File" at the start of each entry as we are adding files only?
I wonder if we also need to provide a tablespace name and directory marker so
that we have "Tablespace" and "Dir" at the start.

Sorry, I am not quite sure about this, may be Robert is right person
to answer this.


5.
If I don't provide manifest-with-checksums option then too I see that checksum
is calculated for backup_manifest file itself. Is that intentional or missed?
I think we should omit that too if this option is not provided.


Oops yeah, corrected this in the latest version of patch.

6.
Is it possible to get only a backup manifest from the server? A client like
pg_basebackup can then use that to fetch files reading that.


Currently we don't have any option to just get the manifest file from the
server.  I am not sure but why we need this at this point of time.



Regards,

Rushabh Lathia
Attachment

Re: backup manifests

From
Robert Haas
Date:
On Tue, Nov 19, 2019 at 8:49 AM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> I admit I haven't been following along closely, but why do we need a
> cryptographic checksum here instead of, say, a CRC? Do we think that
> somehow the checksum might be forged? Use of cryptographic hashes as
> general purpose checksums has become far too common IMNSHO.

I tend to agree with you. I suspect if we just use CRC, some people
are going to complain that they want something "stronger" because that
will make them feel better about error detection rates or obscure
threat models or whatever other things a SHA-based approach might be
able to catch that CRC would not catch. However, I suspect that for
normal use cases, CRC would be totally adequate, and the fact that the
performance overhead is almost none vs. a whole lot - at least in this
test setup, other results might vary depending on what you test -
makes it look pretty appealing.

My gut reaction is to make CRC the default, but have an option that
you can use to either turn it off entirely (if even 1-2% is too much
for you) or opt in to SHA-something if you want it. I don't think we
should offer an option for MD5, because MD5 is a dirty word these days
and will cause problems for users who have to worry about FIPS 140-2
compliance. Phrased more positively, if you want a cryptographic hash
at all, you should probably use one that isn't widely viewed as too
weak.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
David Steele
Date:
On 11/22/19 10:58 AM, Robert Haas wrote:
> On Tue, Nov 19, 2019 at 8:49 AM Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> I admit I haven't been following along closely, but why do we need a
>> cryptographic checksum here instead of, say, a CRC? Do we think that
>> somehow the checksum might be forged? Use of cryptographic hashes as
>> general purpose checksums has become far too common IMNSHO.
> 
> I tend to agree with you. I suspect if we just use CRC, some people
> are going to complain that they want something "stronger" because that
> will make them feel better about error detection rates or obscure
> threat models or whatever other things a SHA-based approach might be
> able to catch that CRC would not catch. 

Well, the maximum amount of data that can be protected with a 32-bit CRC
is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
presume that's what we are talking about since I can't find any 64-bit
CRC code in core or this patch.

So, that's half of what we need with the default relation segment size
(I've seen larger in the field).

> I don't think we
> should offer an option for MD5, because MD5 is a dirty word these days
> and will cause problems for users who have to worry about FIPS 140-2
> compliance. 

+1.

> Phrased more positively, if you want a cryptographic hash
> at all, you should probably use one that isn't widely viewed as too
> weak.

Sure.  There's another advantage to picking an algorithm with lower
collision rates, though.

CRCs are fine for catching transmission errors (as caveated above) but
not as great for comparing two files for equality.  With strong hashes
you can confidently compare local files against the path, size, and hash
stored in the manifest and save yourself a round-trip to the remote
storage to grab the file if it has not changed locally.

This is the basic premise of what we call delta restore which can speed
up restores by orders of magnitude.

Delta restore is the main advantage that made us decide to require SHA1
checksums.  In most cases, restore speed is more important than backup
speed.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Robert Haas
Date:
On Tue, Nov 19, 2019 at 4:34 PM David Steele <david@pgmasters.net> wrote:
> On 11/19/19 5:00 AM, Rushabh Lathia wrote:
> > My colleague Suraj did testing and noticed the performance impact
> > with the checksums.   On further testing, he found that specifically with
> > sha its more of performance impact.
>
> We have found that SHA1 adds about 3% overhead when the backup is also
> compressed (gzip -6), which is what most people want to do.  This
> percentage goes down even more if the backup is being transferred over a
> network or to an object store such as S3.

I don't really understand why your tests and Suraj's tests are showing
such different results, or how compression plays into it. I tried
running shasum -a$N lineitem-big.csv on my laptop, where that file
contains ~70MB of random-looking data whose source I no longer
remember. Here are the results by algorithm: SHA1, ~25 seconds; SHA224
or SHA256, ~52 seconds; SHA384 and SHA512, ~39 seconds. Aside from the
interesting discovery that the algorithms with more bits actually run
faster on this machine, this seems to show that there's only about a
~2x difference between the SHA1 that you used and that I (pretty much
arbitrarily) used. But Rushabh and Suraj are reporting 43-54%
overhead, and even if you divide that by two it's a lot more than 3%.

One possible explanation is that the compression is really slow, and
so it makes the checksum overhead a smaller percentage of the total.
Like, if you've already slowed down the backup by 8x, then 24%
overhead turns into 3% overhead! But I assume that's not the real
explanation here. Another explanation is that your tests were
I/O-bound rather than CPU-bound, maybe because you tested with a much
larger database or a much smaller amount of I/O bandwidth. If you had
CPU cycles to burn, then neither compression nor checksums will cost
much in terms of overall runtime. But that's a little hard to swallow,
too, because I don't think the testing mentioned above was done using
any sort of exotic test configuration, so why would yours be so
different? Another possibility is that Suraj and Rushabh messed up the
tests, or alternatively that you did. Or, it could be that your
checksum implementation is way faster than the one PG uses, and so the
impact was much less. I don't know, but I'm having a hard time
understanding the divergent results. Any ideas?

> We judged that the lower collision rate of SHA1 justified the additional
> expense.
>
> That said, making SHA256 optional seems reasonable.  We decided not to
> make our SHA1 checksums optional to reduce the test matrix and because
> parallelism largely addressed performance concerns.

Just to be clear, I really don't have any objection to using SHA1
instead of SHA256, or anything else for that matter. I picked the one
to use out of a hat for the purpose of having a POC quickly; I didn't
have any intention to insist on that as the final selection. It seems
likely that anything we pick here will eventually be considered
obsolete, so I think we need to allow for configurability, but I don't
have a horse in the game as far as an initial selection goes.

Except - and this gets back to the previous point - I don't want to
slow down backups by 40% by default. I wouldn't mind slowing them down
3% by default, but 40% is too much overhead. I think we've gotta
either the overhead of using SHA way down or not use SHA by default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote:
> Well, the maximum amount of data that can be protected with a 32-bit CRC
> is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
> presume that's what we are talking about since I can't find any 64-bit
> CRC code in core or this patch.

Could you give a more precise citation for this? I can't find a
reference to that in the Wikipedia article off-hand and I don't know
where to look in NIST. I apologize if I'm being dense here, but I
don't see why there should be any limit on the amount of data that can
be protected. The important thing is that if the original file F is
altered to F', we hope that CHECKSUM(F) != CHECKSUM(F'). The
probability of that, assuming that the alteration is random rather
than malicious and that the checksum function is equally likely to
produce every possible output, is just 1-2^-${CHECKSUM_BITS},
regardless of the length of the message (except that there might be
some special cases for very short messages, which don't matter here).

This analysis by me seems to match
https://en.wikipedia.org/wiki/Cyclic_redundancy_check, which says:

"Typically an n-bit CRC applied to a data block of arbitrary length
will detect any single error burst not longer than n bits, and the
fraction of all longer error bursts that it will detect is (1 −
2^−n)."

Notice the phrase "a data block of arbitrary length" and the formula "1 - 2^-n".

> > Phrased more positively, if you want a cryptographic hash
> > at all, you should probably use one that isn't widely viewed as too
> > weak.
>
> Sure.  There's another advantage to picking an algorithm with lower
> collision rates, though.
>
> CRCs are fine for catching transmission errors (as caveated above) but
> not as great for comparing two files for equality.  With strong hashes
> you can confidently compare local files against the path, size, and hash
> stored in the manifest and save yourself a round-trip to the remote
> storage to grab the file if it has not changed locally.

I agree in part. I think there are two reasons why a cryptographically
strong hash is desirable for delta restore. First, since the checksums
are longer, the probability of a false match happening randomly is
lower, which is important. Even if the above analysis is correct and
the chance of a false match is just 2^-32 with a 32-bit CRC, if you
back up ten million files every day, you'll likely get a false match
within a few years or less, and once is too often. Second, unlike what
I supposed above, the contents of a PostgreSQL data file are not
chosen at random, unlike transmission errors, which probably are more
or less random. It seems somewhat possible that there is an adversary
who is trying to choose the data that gets stored in some particular
record so as to create a false checksum match. A CRC is a lot easier
to fool than a crytographic hash, so I think that using a CRC of *any*
length for this kind of use case would be extremely dangerous no
matter the probability of an accidental match.

> This is the basic premise of what we call delta restore which can speed
> up restores by orders of magnitude.
>
> Delta restore is the main advantage that made us decide to require SHA1
> checksums.  In most cases, restore speed is more important than backup
> speed.

I see your point, but it's not the whole story. We've encountered a
bunch of cases where the time it took to complete a backup exceeded
the user's desired backup interval, which is obviously very bad, or
even more commonly where it exceeded the length of the user's
"low-usage" period when they could tolerate the extra overhead imposed
by the backup. A few percentage points is probably not a big deal, but
a user who has an 8-hour window to get the backup done overnight will
not be happy if it's taking 6 hours now and we tack 40%-50% on to
that. So I think that we either have to disable backup checksums by
default, or figure out a way to get the overhead down to something a
lot smaller than what current tests are showing -- which we could
possibly do without changing the algorithm if we can somehow make it a
lot cheaper, but otherwise I think the choice is between disabling the
functionality altogether by default and adopting a less-expensive
algorithm. Maybe someday when delta restore is in core and widely used
and CPUs are faster, it'll make sense to revise the default, and
that's cool, but I can't see imposing a big overhead by default to
enable a feature core doesn't have yet...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
David Steele
Date:
On 11/22/19 1:24 PM, Robert Haas wrote:
> On Tue, Nov 19, 2019 at 4:34 PM David Steele <david@pgmasters.net> wrote:
>> On 11/19/19 5:00 AM, Rushabh Lathia wrote:
>>> My colleague Suraj did testing and noticed the performance impact
>>> with the checksums.   On further testing, he found that specifically with
>>> sha its more of performance impact.
>>
>> We have found that SHA1 adds about 3% overhead when the backup is also
>> compressed (gzip -6), which is what most people want to do.  This
>> percentage goes down even more if the backup is being transferred over a
>> network or to an object store such as S3.
> 
> I don't really understand why your tests and Suraj's tests are showing
> such different results, or how compression plays into it. I tried
> running shasum -a$N lineitem-big.csv on my laptop, where that file
> contains ~70MB of random-looking data whose source I no longer
> remember. Here are the results by algorithm: SHA1, ~25 seconds; SHA224
> or SHA256, ~52 seconds; SHA384 and SHA512, ~39 seconds. Aside from the
> interesting discovery that the algorithms with more bits actually run
> faster on this machine, this seems to show that there's only about a
> ~2x difference between the SHA1 that you used and that I (pretty much
> arbitrarily) used. But Rushabh and Suraj are reporting 43-54%
> overhead, and even if you divide that by two it's a lot more than 3%.
> 
> One possible explanation is that the compression is really slow, and
> so it makes the checksum overhead a smaller percentage of the total.
> Like, if you've already slowed down the backup by 8x, then 24%
> overhead turns into 3% overhead! But I assume that's not the real
> explanation here. 

That's the real explanation here.  Hash calculations run at the same
speed, they just become a smaller portion of the *total* time once
compression (gzip -6) is added.  With something like lz4 hashing will
obviously be a big percentage of the total.

Also consider how much extra latency you get from copying over a
network.  My 3% did not include that but realistically most backups are
running over a network (hopefully).

>> That said, making SHA256 optional seems reasonable.  We decided not to
>> make our SHA1 checksums optional to reduce the test matrix and because
>> parallelism largely addressed performance concerns.
> 
> Just to be clear, I really don't have any objection to using SHA1
> instead of SHA256, or anything else for that matter. I picked the one
> to use out of a hat for the purpose of having a POC quickly; I didn't
> have any intention to insist on that as the final selection. It seems
> likely that anything we pick here will eventually be considered
> obsolete, so I think we need to allow for configurability, but I don't
> have a horse in the game as far as an initial selection goes.

We decided that SHA1 was good enough and there was no need to go up to
SHA256.  What we were interested in was collision rates and what the
chance of getting a false positive were based on the combination of
path, size, and hash.  With SHA1 the chance of a collision was literally
astronomically low (as in the universe would probably end before it
happened, depending on whether you are an expand forever or contract
proponent).

> Except - and this gets back to the previous point - I don't want to
> slow down backups by 40% by default. I wouldn't mind slowing them down
> 3% by default, but 40% is too much overhead. I think we've gotta
> either the overhead of using SHA way down or not use SHA by default.

Maybe -- my take is that the measurements, an uncompressed backup to the
local filesystem, are not a very realistic use case.

However, I'm still fine with leaving the user the option of checksums or
no.  I just wanted to point out that CRCs have their limits so maybe
that's not a great option unless it is properly caveated and perhaps not
the default.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
David Steele
Date:
On 11/22/19 2:01 PM, Robert Haas wrote:
> On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote:
>> Well, the maximum amount of data that can be protected with a 32-bit CRC
>> is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
>> presume that's what we are talking about since I can't find any 64-bit
>> CRC code in core or this patch.
> 
> Could you give a more precise citation for this? 

See:
https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf
Search for "The maximum block size"

https://en.wikipedia.org/wiki/Cyclic_redundancy_check
"The design of the CRC polynomial depends on the maximum total length of
the block to be protected (data + CRC bits)", which I took to mean there
are limits.

Here another interesting bit from:
https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks
"Because a CRC is based on division, no polynomial can detect errors
consisting of a string of zeroes prepended to the data, or of missing
leading zeroes" -- but it appears to matter what CRC you are using.
There's a variation that works in this case and hopefully we are using
that one.

This paper talks about appropriate block lengths vs crc length:
http://users.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf
but it is concerned with network transmission and small block lengths.

> "Typically an n-bit CRC applied to a data block of arbitrary length
> will detect any single error burst not longer than n bits, and the
> fraction of all longer error bursts that it will detect is (1 −
> 2^−n)."

I'm not sure how encouraging I find this -- a four-byte error not a lot
and 2^32 is only 4 billion.  We have individual users who have backed up
more than 4 billion files over the last few years.

>> This is the basic premise of what we call delta restore which can speed
>> up restores by orders of magnitude.
>>
>> Delta restore is the main advantage that made us decide to require SHA1
>> checksums.  In most cases, restore speed is more important than backup
>> speed.
> 
> I see your point, but it's not the whole story. We've encountered a
> bunch of cases where the time it took to complete a backup exceeded
> the user's desired backup interval, which is obviously very bad, or
> even more commonly where it exceeded the length of the user's
> "low-usage" period when they could tolerate the extra overhead imposed
> by the backup. A few percentage points is probably not a big deal, but
> a user who has an 8-hour window to get the backup done overnight will
> not be happy if it's taking 6 hours now and we tack 40%-50% on to
> that. So I think that we either have to disable backup checksums by
> default, or figure out a way to get the overhead down to something a
> lot smaller than what current tests are showing -- which we could
> possibly do without changing the algorithm if we can somehow make it a
> lot cheaper, but otherwise I think the choice is between disabling the
> functionality altogether by default and adopting a less-expensive
> algorithm. Maybe someday when delta restore is in core and widely used
> and CPUs are faster, it'll make sense to revise the default, and
> that's cool, but I can't see imposing a big overhead by default to
> enable a feature core doesn't have yet...

OK, I'll buy that.  But I *don't* think CRCs should be allowed for
deltas (when we have them) and I *do* think we should caveat their
effectiveness (assuming we can agree on them).

In general the answer to faster backups should be more cores/faster
network/faster disk, not compromising backup integrity.  I understand
we'll need to wait until we have parallelism in pg_basebackup to justify
that answer.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Tels
Date:
Moin Robert,

On 2019-11-22 20:01, Robert Haas wrote:
> On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> 
> wrote:
>> Well, the maximum amount of data that can be protected with a 32-bit 
>> CRC
>> is 512MB according to all the sources I found (NIST, Wikipedia, etc).  
>> I
>> presume that's what we are talking about since I can't find any 64-bit
>> CRC code in core or this patch.
> 
> Could you give a more precise citation for this? I can't find a
> reference to that in the Wikipedia article off-hand and I don't know
> where to look in NIST. I apologize if I'm being dense here, but I
> don't see why there should be any limit on the amount of data that can
> be protected. The important thing is that if the original file F is
> altered to F', we hope that CHECKSUM(F) != CHECKSUM(F'). The
> probability of that, assuming that the alteration is random rather
> than malicious and that the checksum function is equally likely to
> produce every possible output, is just 1-2^-${CHECKSUM_BITS},
> regardless of the length of the message (except that there might be
> some special cases for very short messages, which don't matter here).
> 
> This analysis by me seems to match
> https://en.wikipedia.org/wiki/Cyclic_redundancy_check, which says:
> 
> "Typically an n-bit CRC applied to a data block of arbitrary length
> will detect any single error burst not longer than n bits, and the
> fraction of all longer error bursts that it will detect is (1 −
> 2^−n)."
> 
> Notice the phrase "a data block of arbitrary length" and the formula "1 
> - 2^-n".

It is related to the number of states, and the birthday problem factors 
in it, too:

    https://en.wikipedia.org/wiki/Birthday_problem

If you have a 32 bit checksum or hash, it can represent only 2**32-1 
states at most (or less, if the
algorithmn isn't really good).

Each byte is 8 bit, so 2 ** 32 / 8 is 512 Mbyte. If you process your 
data bit by bit, each
new bit would add a new state (consider: missing bit == 0, added bit == 
1). If each new state
is repesented by a different checksum, all possible 2 ** 32 values are 
exhausted after
processing 512 Mbyte, after that you get one of the former states again 
- aka a collision.

There is no way around it with so little bits, no matter what algorithmn 
you choose.

>> > Phrased more positively, if you want a cryptographic hash
>> > at all, you should probably use one that isn't widely viewed as too
>> > weak.
>> 
>> Sure.  There's another advantage to picking an algorithm with lower
>> collision rates, though.
>> 
>> CRCs are fine for catching transmission errors (as caveated above) but
>> not as great for comparing two files for equality.  With strong hashes
>> you can confidently compare local files against the path, size, and 
>> hash
>> stored in the manifest and save yourself a round-trip to the remote
>> storage to grab the file if it has not changed locally.
> 
> I agree in part. I think there are two reasons why a cryptographically
> strong hash is desirable for delta restore. First, since the checksums
> are longer, the probability of a false match happening randomly is
> lower, which is important. Even if the above analysis is correct and
> the chance of a false match is just 2^-32 with a 32-bit CRC, if you
> back up ten million files every day, you'll likely get a false match
> within a few years or less, and once is too often. Second, unlike what
> I supposed above, the contents of a PostgreSQL data file are not
> chosen at random, unlike transmission errors, which probably are more
> or less random. It seems somewhat possible that there is an adversary
> who is trying to choose the data that gets stored in some particular
> record so as to create a false checksum match. A CRC is a lot easier
> to fool than a crytographic hash, so I think that using a CRC of *any*
> length for this kind of use case would be extremely dangerous no
> matter the probability of an accidental match.

Agreed. See above.

However, if you choose a hash, please do not go below SHA-256. Both MD5
and SHA-1 already had collision attacks, and these only got to be bound
to be worse.

   https://www.mscs.dal.ca/~selinger/md5collision/
   https://shattered.io/

It might even be a wise idea to encode the used Hash-Algorithm into the
manifest file, so it can be changed later. The hash length might be not
enough to decide which algorithm is the one used.

>> This is the basic premise of what we call delta restore which can 
>> speed
>> up restores by orders of magnitude.
>> 
>> Delta restore is the main advantage that made us decide to require 
>> SHA1
>> checksums.  In most cases, restore speed is more important than backup
>> speed.
> 
> I see your point, but it's not the whole story. We've encountered a
> bunch of cases where the time it took to complete a backup exceeded
> the user's desired backup interval, which is obviously very bad, or
> even more commonly where it exceeded the length of the user's
> "low-usage" period when they could tolerate the extra overhead imposed
> by the backup. A few percentage points is probably not a big deal, but
> a user who has an 8-hour window to get the backup done overnight will
> not be happy if it's taking 6 hours now and we tack 40%-50% on to
> that. So I think that we either have to disable backup checksums by
> default, or figure out a way to get the overhead down to something a
> lot smaller than what current tests are showing -- which we could
> possibly do without changing the algorithm if we can somehow make it a
> lot cheaper, but otherwise I think the choice is between disabling the
> functionality altogether by default and adopting a less-expensive
> algorithm. Maybe someday when delta restore is in core and widely used
> and CPUs are faster, it'll make sense to revise the default, and
> that's cool, but I can't see imposing a big overhead by default to
> enable a feature core doesn't have yet...

Modern algorithms are amazingly fast on modern hardware, some even
are implemented in hardware nowadays:

  https://software.intel.com/en-us/articles/intel-sha-extensions

Quote from:

  
https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring-sha-extensions-to-intels-cpus/

  "Despite the extremely limited availability of SHA extension support
   in modern desktop and mobile processors, crypto libraries have already
   upstreamed support to great effect. Botan’s SHA extension patches show 
a
   significant 3x to 5x performance boost when taking advantage of the 
hardware
   extensions, and the Linux kernel itself shipped with hardware SHA 
support
   with version 4.4, bringing a very respectable 3.6x performance upgrade 
over
   the already hardware-assisted SSE3-enabled code."

If you need to load the data from disk and shove it over a network, the
hashing will certainly be very little overhead, it might even be 
completely
invisible, since it can run in paralell to all the other things. Sure, 
there
is the thing called zero-copy-networking, but if you have to compress 
the
data bevore sending it to the network, you have to put it through the 
CPU,
anyway. And if you have more than one core, the second one can to the
hashing it paralell to the first one doing the compression.

To get a feeling one can use:

    openssl speed md5 sha1 sha256 sha512

On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:

  The 'numbers' are in 1000s of bytes per second processed.
   type       16 bytes     64 bytes    256 bytes   1024 bytes   8192 
bytes  16384 bytes
   md5       122638.55k   277023.96k   487725.57k   630806.19k   
683892.74k   688553.98k
   sha1      127226.45k   313891.52k   632510.55k   865753.43k   
960995.33k   977215.19k
   sha256     77611.02k   173368.15k   325460.99k   412633.43k   
447022.92k   448020.48k
   sha512     51164.77k   205189.87k   361345.79k   543883.26k   
638372.52k   645933.74k

Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd
need a
pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these 
speeds
and then you'd use a real CPU for your server, not some poor Intel 
powersaving
surfing thingy-majingy :)

Best regards,

Tels



Re: backup manifests

From
David Steele
Date:
On 11/22/19 5:15 PM, Tels wrote:
> On 2019-11-22 20:01, Robert Haas wrote:
>> On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote:
> 
>>> > Phrased more positively, if you want a cryptographic hash
>>> > at all, you should probably use one that isn't widely viewed as too
>>> > weak.
>>>
>>> Sure.  There's another advantage to picking an algorithm with lower
>>> collision rates, though.
>>>
>>> CRCs are fine for catching transmission errors (as caveated above) but
>>> not as great for comparing two files for equality.  With strong hashes
>>> you can confidently compare local files against the path, size, and hash
>>> stored in the manifest and save yourself a round-trip to the remote
>>> storage to grab the file if it has not changed locally.
>>
>> I agree in part. I think there are two reasons why a cryptographically
>> strong hash is desirable for delta restore. First, since the checksums
>> are longer, the probability of a false match happening randomly is
>> lower, which is important. Even if the above analysis is correct and
>> the chance of a false match is just 2^-32 with a 32-bit CRC, if you
>> back up ten million files every day, you'll likely get a false match
>> within a few years or less, and once is too often. Second, unlike what
>> I supposed above, the contents of a PostgreSQL data file are not
>> chosen at random, unlike transmission errors, which probably are more
>> or less random. It seems somewhat possible that there is an adversary
>> who is trying to choose the data that gets stored in some particular
>> record so as to create a false checksum match. A CRC is a lot easier
>> to fool than a crytographic hash, so I think that using a CRC of *any*
>> length for this kind of use case would be extremely dangerous no
>> matter the probability of an accidental match.
> 
> Agreed. See above.
> 
> However, if you choose a hash, please do not go below SHA-256. Both MD5
> and SHA-1 already had collision attacks, and these only got to be bound
> to be worse.

I don't think collision attacks are a big consideration in the general
case.  The manifest is generally stored with the backup files so if a
file is modified it is then trivial to modify the manifest as well.

Of course, you could store the manifest separately or even just know the
hash of the manifest and store that separately.  In that case SHA-256
might be useful and it would be good to have the option, which I believe
is the plan.

I do wonder if you could construct a successful collision attack (even
in MD5) that would also result in a valid relation file.  Probably, at
least eventually.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Tels
Date:
Moin,

On 2019-11-22 23:30, David Steele wrote:
> On 11/22/19 5:15 PM, Tels wrote:
>> On 2019-11-22 20:01, Robert Haas wrote:
>>> On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> 
>>> wrote:
>> 
>>>> > Phrased more positively, if you want a cryptographic hash
>>>> > at all, you should probably use one that isn't widely viewed as too
>>>> > weak.
>>>> 
>>>> Sure.  There's another advantage to picking an algorithm with lower
>>>> collision rates, though.
>>>> 
>>>> CRCs are fine for catching transmission errors (as caveated above) 
>>>> but
>>>> not as great for comparing two files for equality.  With strong 
>>>> hashes
>>>> you can confidently compare local files against the path, size, and 
>>>> hash
>>>> stored in the manifest and save yourself a round-trip to the remote
>>>> storage to grab the file if it has not changed locally.
>>> 
>>> I agree in part. I think there are two reasons why a 
>>> cryptographically
>>> strong hash is desirable for delta restore. First, since the 
>>> checksums
>>> are longer, the probability of a false match happening randomly is
>>> lower, which is important. Even if the above analysis is correct and
>>> the chance of a false match is just 2^-32 with a 32-bit CRC, if you
>>> back up ten million files every day, you'll likely get a false match
>>> within a few years or less, and once is too often. Second, unlike 
>>> what
>>> I supposed above, the contents of a PostgreSQL data file are not
>>> chosen at random, unlike transmission errors, which probably are more
>>> or less random. It seems somewhat possible that there is an adversary
>>> who is trying to choose the data that gets stored in some particular
>>> record so as to create a false checksum match. A CRC is a lot easier
>>> to fool than a crytographic hash, so I think that using a CRC of 
>>> *any*
>>> length for this kind of use case would be extremely dangerous no
>>> matter the probability of an accidental match.
>> 
>> Agreed. See above.
>> 
>> However, if you choose a hash, please do not go below SHA-256. Both 
>> MD5
>> and SHA-1 already had collision attacks, and these only got to be 
>> bound
>> to be worse.
> 
> I don't think collision attacks are a big consideration in the general
> case.  The manifest is generally stored with the backup files so if a
> file is modified it is then trivial to modify the manifest as well.

That is true. However, a simple way around this is to sign the manifest
with a public key l(GPG or similiar). And if the manifest contains
strong, hard-to-forge hashes, we got a mure more secure backup, where
(almost) nobody else can alter the manifest, nor can he mount easy
collision attacks against the single files.

Without the strong hashes it would be pointless to sign the manifest.

> Of course, you could store the manifest separately or even just know 
> the
> hash of the manifest and store that separately.  In that case SHA-256
> might be useful and it would be good to have the option, which I 
> believe
> is the plan.
> 
> I do wonder if you could construct a successful collision attack (even
> in MD5) that would also result in a valid relation file.  Probably, at
> least eventually.

With MD5, certainly. One way is to have two block of 512 bits that hash
to the different MD5s. It is trivial to re-use one already existing from
the known examples.

Here is one, where the researchers constructed 12 PDFs that all
have the same MD5 hash:

   https://www.win.tue.nl/hashclash/Nostradamus/

If you insert one of these blocks into a relation and dump it, you could
swap it (probably?) out on disk for the other block. I'm not sure this
is of practical usage as an attack, tho. It would, however, cast doubt
on the integrity of the backup and prove that MD5 is useless.

OTOH, finding a full collision with MD5 should also be in reach with
todays hardware. It is hard find exact numbers but this:

    https://www.win.tue.nl/hashclash/SingleBlock/

gives the following numbers for 2008/2009:

   "Finding the birthday bits took 47 hours (expected was 3 days) on the
   cluster of 215 Playstation 3 game consoles at LACAL, EPFL. This is
   roughly equivalent to 400,000 hours on a single PC core. The single
   near-collision block construction took 18 hours and 20 minutes on a
   single PC core."

Today one can probably compute it on a single GPU in mere hours. And you
can rent massive amounts of them in the cloud for real cheap.

Here are a few, now a bit dated, references:

    https://blog.codinghorror.com/speed-hashing/
    http://codahale.com/how-to-safely-store-a-password/

Best regards,

Tels



Re: backup manifests

From
Andrew Dunstan
Date:
On 11/23/19 3:13 AM, Tels wrote:
>
> Without the strong hashes it would be pointless to sign the manifest.
>
>

I guess I must have missed where we are planning to add a cryptographic
signature.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backup manifests

From
David Steele
Date:
On 11/23/19 4:34 PM, Andrew Dunstan wrote:
> 
> On 11/23/19 3:13 AM, Tels wrote:
>>
>> Without the strong hashes it would be pointless to sign the manifest.
>>
> 
> I guess I must have missed where we are planning to add a cryptographic
> signature.

I don't think we were planning to, but the user could do so if they wished.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Suraj Kharage
Date:
Hi Jeevan,

I have incorporated all the comments in the attached patch. Please review and let me know your thoughts.

On Thu, Nov 21, 2019 at 2:51 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:


On Wed, Nov 20, 2019 at 11:05 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
Hi,

Since now we are generating the backup manifest file with each backup, it provides us an option to validate the given backup.  
Let's say, we have taken a backup and after a few days, we want to check whether that backup is validated or corruption-free without restarting the server.

Please find attached POC patch for same which will be based on the latest backup manifest patch from Rushabh. With this functionality, we add new option to pg_basebackup, something like --verify-backup.
So, the syntax would be:
./bin/pg_basebackup --verify-backup -D <backup_directory_path>

Basically, we read the backup_manifest file line by line from the given directory path and build the hash table, then scan the directory and compare each file with the hash entry.

Thoughts/suggestions?


I like the idea of verifying the backup once we have backup_manifest with us.
Periodically verifying the already taken backup with this simple tool becomes
easy now.

I have reviewed this patch and here are my comments:

1.
@@ -30,7 +30,9 @@
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/sha2.h"
 #include "common/string.h"
+#include "fe_utils/simple_list.h"
 #include "fe_utils/recovery_gen.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
@@ -38,12 +40,19 @@
 #include "pgtar.h"
 #include "pgtime.h"
 #include "pqexpbuffer.h"
+#include "pgrhash.h"
 #include "receivelog.h"
 #include "replication/basebackup.h"
 #include "streamutil.h"


Please add new files in order.

2.
Can hash related file names be renamed to backuphash.c and backuphash.h?

3.
Need indentation adjustments at various places.

4.
+            char        buf[1000000];  // 1MB chunk

It will be good if we have multiple of block /page size (or at-least power of 2
number).

5.
+typedef struct pgrhash_entry
+{
+    struct pgrhash_entry *next; /* link to next entry in same bucket */
+    DataDirectoryFileInfo *record;
+} pgrhash_entry;
+
+struct pgrhash
+{
+    unsigned    nbuckets;        /* number of buckets */
+    pgrhash_entry **bucket;        /* pointer to hash entries */
+};
+

+typedef struct pgrhash pgrhash;

These two can be moved to .h file instead of redefining over there.

6.
+/*
+ * TODO: this function is not necessary, can be removed.
+ * Test whether the given row number is match for the supplied keys.
+ */
+static bool
+pgrhash_compare(char *bt_filename, char *filename)

Yeah, it can be removed by doing strcmp() at the required places rather than
doing it in a separate function.

7.
mdate is not compared anywhere. I understand that it can't be compared with
the file in the backup directory and its entry in the manifest as manifest
entry gives mtime from server file whereas the same file in the backup will
have different mtime. But adding a few comments there will be good.

8.
+    char        mdate[24];

should be mtime instead?


Thanks

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company



--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Tels
Date:
On 2019-11-24 15:38, David Steele wrote:
> On 11/23/19 4:34 PM, Andrew Dunstan wrote:
>> 
>> On 11/23/19 3:13 AM, Tels wrote:
>>> 
>>> Without the strong hashes it would be pointless to sign the manifest.
>>> 
>> 
>> I guess I must have missed where we are planning to add a 
>> cryptographic
>> signature.
> 
> I don't think we were planning to, but the user could do so if they 
> wished.

That was what I meant.

Best regards,

Tels



Re: backup manifests

From
Robert Haas
Date:
On Fri, Nov 22, 2019 at 2:29 PM David Steele <david@pgmasters.net> wrote:
> See:
> https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf
> Search for "The maximum block size"

Hmm, so it says: "The maximum block size that can be protected by a
32-bit CRC is 512MB." My problem is that (1) it doesn't back this up
with a citation or any kind of logical explanation and (2) it's not
very clear what "protected" means. Tels replies downthread to explain
that the internal state of the 32-bit CRC calculation is also limited
to 32 bits, and changes once per bit, so that after processing 512MB =
2^29 bytes = 2^32 bits of data, you're guaranteed to start repeating
internal states. Perhaps this is also what the NIST folks had in mind,
though it's hard to know.

This link provides some more details:

https://community.arm.com/developer/tools-software/tools/f/keil-forum/17467/crc-for-256-byte-data

Not everyone on the thread agrees with everybody else, but it seems
like there are size limits below which a CRC-n is guaranteed to detect
all 1-bit and 2-bit errors, and above which this is no longer
guaranteed. They put the limit *lower* than what NIST supposes, namely
2^(n-1)-1 bits, which would be 256MB, not 512MB, if I'm doing math
correctly. However, they also say that above that value, you are still
likely to detect most errors. Absent an intelligent adversary, the
chance of a random collision when corruption is present is still about
1 in 4 billion (2^-32).

To me, guaranteed detection of 1-bit and 2-bit errors (and the other
kinds of specific things CRC is designed to catch) doesn't seem like a
principle design consideration. It's nice if we can get it and I'm not
against it, but these are algorithms that are designed to be used when
data undergoes a digital-to-analog-to-digital conversion, where for
example it's possible that that the conversion back to digital loses
sync and reads 9 bits or 7 bits rather than 8 bits. And that's not
really what we're doing here: we all know that bits get flipped
sometimes, but nobody uses scp to copy a 1GB file and ends up with a
file that is 1GB +/- a few bits. Some lower-level part of the
communication stack is handling that part of the work; you're going to
get exactly 1GB. So it seems to me that here, as with XLOG, we're not
relying on the specific CRC properties that were intended to be used
to catch and in some cases repair bit flips caused by wrinkles in an
A-to-D conversion, but just on its general tendency to probably not
match if any bits got flipped. And those properties hold regardless of
input length.

That being said, having done some reading on this, I am a little
concerned that we're getting further and further from the design
center of the CRC algorithm. Like relation segment files, XLOG records
are not packets subject to bit insertions, but at least they're small,
and relation files are not. Using a 40-year-old algorithm that was
intended to be used for things like making sure the modem hadn't lost
framing in the last second to verify 1GB files feels, in some nebulous
way, like we might be stretching. That being said, I'm not sure what
we think the reasonable alternatives are. Users aren't going to be
better off if we say that, because CRC-32C might not do a great job
detecting errors, we're not going to check for errors at all. If we go
the other way and say we're going to use some variant of SHA, they
will be better off, but at the price of what looks like a
*significant* hit in terms of backup time.

> > "Typically an n-bit CRC applied to a data block of arbitrary length
> > will detect any single error burst not longer than n bits, and the
> > fraction of all longer error bursts that it will detect is (1 −
> > 2^−n)."
>
> I'm not sure how encouraging I find this -- a four-byte error not a lot
> and 2^32 is only 4 billion.  We have individual users who have backed up
> more than 4 billion files over the last few years.

I agree that people have a lot more than 4 billion files backed up,
but I'm not sure it matters very much given the use case I'm trying to
enable. There's a lot of difference between delta restore and backup
integrity checking. For backup integrity checking, my goal is that, on
those occasions when a file gets corrupted, the chances that we notice
that it has been corrupted. For that purpose, a 32-bit checksum is
probably sufficient. If a file gets corrupted, we have about a
1-in-4-billion chance of being unable to detect it. If 4 billion files
get corrupted, we'll miss, on average, one of those corruption events.
That's sad, but so is the fact that you had *4 billion corrupted
files*. This is not the total number of files backed up; this is the
number of those that got corrupted. I don't really know how common it
is to copy a file and end up with a corrupt copy, but if you say it's
one-in-a-million, which I suspect is far too high, then you'd have to
back up something like 4 quadrillion files before you missed a
corruption event, and that's a *very* big number.

Now delta restore is a whole different kettle of fish. The birthday
problem is huge here. If you've got a 32-bit checksum for file A, and
you go and look it up in a database of checksums, and that database
has even 1 billion things in it, you've got a pretty decent shot of
latching onto a file that is not actually the same as file A. The
problem goes away almost entirely if you only compare against previous
versions of that file from that database cluster. You've probably only
got tens or maybe at the very outside hundreds or thousands of backups
of that particular file, and a collision is unlikely even with only a
32-bit checksum -- though even there maybe you'd like to use something
larger just to be on the safe side. But if you're going to compare to
other files from the same cluster, or even worse any file from any
cluster, 32 bits is *woefully* inadequate. TBH even using SHA for such
use cases feels a little scary to me. It's probably good enough --
2^160 for SHA-1 is a *lot* bigger than 2^32, and 2^512 for SHA-512 is
enormous. But I'd want to spend time thinking very carefully about the
math before designing such a system.

> OK, I'll buy that.  But I *don't* think CRCs should be allowed for
> deltas (when we have them) and I *do* think we should caveat their
> effectiveness (assuming we can agree on them).

Sounds good.

> In general the answer to faster backups should be more cores/faster
> network/faster disk, not compromising backup integrity.  I understand
> we'll need to wait until we have parallelism in pg_basebackup to justify
> that answer.

I would like to dispute that characterization of what we're talking
about here. If we added a 1-bit checksum (parity bit) it would be
*strictly better* than what we're doing right now, which is nothing.
That's not a serious proposal because it's obvious we can do a lot
better for trivial additional cost, but deciding that we're going to
use a weaker kind of checksum to avoid adding too much overhead is not
wimping out, because it's still going to be strong enough to catch the
overwhelming majority of problems that go undetected today. Even an
*8-bit* checksum would give us a >99% chance of catching a corrupted
file, which would be noticeably better than the 0% chance we have
today. Even a manifest with no checksums at all that just checked the
presence and size of files would catch tons of operator error, e.g.

- wait, that database had tablespaces?
- were those logs in pg_clog anything important?
- oh, i wasn't supposed to start postgres on the copy of the database
stored in the backup directory?

So I don't think we're talking about whether to compromise backup
integrity. I think we're talking about - if we're going to make backup
integrity better than it is today, how much better should we try to
make it, and what are the trade-offs there? The straw man here is that
we could make the database infinitely secure if we put it in a
concrete bunker and sunk it to the bottom of the ocean, with the small
price that we'd no longer be able to access it either. Somewhere
between that extreme and the other extreme of setting the
authentication method to 0.0.0.0/0 trust there's a happy medium where
security is tolerably good but ease of access isn't crippled, and the
same thing applies here. We could (probably) be the first database on
the planet to store a 1024-bit encrypted checksum of every 8kB block,
but that seems like it's going too far in the "concrete bunker"
direction. IMHO, at least, we should be aiming for something that has
a high probability of catching real problems and a low probability of
being super-annoying.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Nov 22, 2019 at 2:02 PM David Steele <david@pgmasters.net> wrote:
> > Except - and this gets back to the previous point - I don't want to
> > slow down backups by 40% by default. I wouldn't mind slowing them down
> > 3% by default, but 40% is too much overhead. I think we've gotta
> > either the overhead of using SHA way down or not use SHA by default.
>
> Maybe -- my take is that the measurements, an uncompressed backup to the
> local filesystem, are not a very realistic use case.

Well, compression is a feature we don't have yet, in core. So for
people who are only using core tools, an uncompressed backup is a very
realistic use case, because it's the only kind they can get. Granted
the situation is different if you are using pgbackrest.

I don't have enough experience to know how often people back up to
local filesystems vs. remote filesystems mounted locally vs. overtly
over-the-network. I sometimes get the impression that users choose
their backup tools and procedures with, as Tom would say, the aid of a
dart board, but that's probably the cynic in me talking. Or maybe a
reflection of the fact that I usually end up talking to the users for
whom things have gone really, really badly wrong, rather than the ones
for whom things went as planned.

> However, I'm still fine with leaving the user the option of checksums or
> no.  I just wanted to point out that CRCs have their limits so maybe
> that's not a great option unless it is properly caveated and perhaps not
> the default.

I think the default is the sticking point here. To me, it looks like
CRC is a better default than nothing at all because it should still
catch a high percentage of issues that would otherwise be missed, and
a better default than SHA because it's so cheap to compute. However,
I'm certainly willing to consider other theories.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Nov 22, 2019 at 5:15 PM Tels <nospam-pg-abuse@bloodgate.com> wrote:
> It is related to the number of states...

Thanks for this explanation. See my reply to David where I also
discuss this point.

> However, if you choose a hash, please do not go below SHA-256. Both MD5
> and SHA-1 already had collision attacks, and these only got to be bound
> to be worse.
>
>    https://www.mscs.dal.ca/~selinger/md5collision/
>    https://shattered.io/

Yikes, that second link, about SHA-1, is depressing. Now, it's not
likely that an attacker has access to your backup repository and can
spend 6500 years of CPU time to engineer a Trojan file there (maybe
more, because the files are probably bigger than the PDFs they used in
that case) and then induce you to restore and rely upon that backup.
However, it's entirely likely that somebody is going to eventually ban
SHA-1 as the attacks get better, which is going to be a problem for us
whether the underlying exposures are problems or not.

> It might even be a wise idea to encode the used Hash-Algorithm into the
> manifest file, so it can be changed later. The hash length might be not
> enough to decide which algorithm is the one used.

I agree. Let's write
SHA256:bc1c3a57369acd0d2183a927fb2e07acbbb1c97f317bbc3b39d93ec65b754af5
or similar rather than just the hash. That way even if the entire SHA
family gets cracked, we can easily substitute in something else that
hasn't been cracked yet.

(It is unclear to me why anyone supposes that *any* popular hash
function won't eventually be cracked. For a K-bit hash function, there
are 2^K possible outputs, where K is probably in the hundreds. But
there are 2^{2^33} possible 1GB files. So for every possible output
value, there are 2^{2^33-K} inputs that produce that value, which is a
very very big number. The probability that any given input produces a
certain output is very low, but the number of possible inputs that
produce a given output is very high; so assuming that nobody's ever
going to figure out how to construct them seems optimistic.)

> To get a feeling one can use:
>
>     openssl speed md5 sha1 sha256 sha512
>
> On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:
>
>   The 'numbers' are in 1000s of bytes per second processed.
>    type       16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes  16384 bytes
>    md5       122638.55k   277023.96k   487725.57k   630806.19k
> 683892.74k   688553.98k
>    sha1      127226.45k   313891.52k   632510.55k   865753.43k
> 960995.33k   977215.19k
>    sha256     77611.02k   173368.15k   325460.99k   412633.43k
> 447022.92k   448020.48k
>    sha512     51164.77k   205189.87k   361345.79k   543883.26k
> 638372.52k   645933.74k
>
> Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
> 427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd
> need a
> pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these
> speeds
> and then you'd use a real CPU for your server, not some poor Intel
> powersaving
> surfing thingy-majingy :)

I mean, how fast is in theory doesn't matter nearly as much as what
happens when you benchmark the proposed implementation, and the
results we have so far don't support the theory that this is so cheap
as to be negligible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Rushabh Lathia
Date:

As per the  discussion on the thread, here is the patch which

a) Make checksum for manifest file optional.
b) Allow user to choose a particular algorithm.

Currently with the WIP patch SHA256 and CRC checksum algorithm
supported.  Patch also changed the manifest file format to append
the used algorithm name before the checksum, this way it will be
easy to validator to know which algorithm to used.

Ex:
./db/bin/pg_basebackup -D bksha/ --manifest-with-checksums=SHA256

$ cat bksha/backup_manifest  | more
PostgreSQL-Backup-Manifest-Version 1
File backup_label 226 2019-12-04 17:46:46 GMT SHA256:7cf53d1b9facca908678ab70d93a9e7460cd35cedf7891de948dcf858f8a281a
File pg_xact/0000 8192 2019-12-04 17:46:46 GMT SHA256:8d2b6cb1dc1a6e8cee763b52d75e73571fddce06eb573861d44082c7d8c03c26

./db/bin/pg_basebackup -D bkcrc/ --manifest-with-checksums=CRC
PostgreSQL-Backup-Manifest-Version 1
File backup_label 226 2019-12-04 17:58:40 GMT CRC:343138313931333134
File pg_xact/0000 8192 2019-12-04 17:46:46 GMT CRC:363538343433333133

Pending TODOs:
- Documentation update
- Code cleanup
- Testing.

I will further continue to work on the patch and meanwhile feel free to provide
thoughts/inputs.

Thanks,


On Mon, Nov 25, 2019 at 11:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Nov 22, 2019 at 5:15 PM Tels <nospam-pg-abuse@bloodgate.com> wrote:
> It is related to the number of states...

Thanks for this explanation. See my reply to David where I also
discuss this point.

> However, if you choose a hash, please do not go below SHA-256. Both MD5
> and SHA-1 already had collision attacks, and these only got to be bound
> to be worse.
>
>    https://www.mscs.dal.ca/~selinger/md5collision/
>    https://shattered.io/

Yikes, that second link, about SHA-1, is depressing. Now, it's not
likely that an attacker has access to your backup repository and can
spend 6500 years of CPU time to engineer a Trojan file there (maybe
more, because the files are probably bigger than the PDFs they used in
that case) and then induce you to restore and rely upon that backup.
However, it's entirely likely that somebody is going to eventually ban
SHA-1 as the attacks get better, which is going to be a problem for us
whether the underlying exposures are problems or not.

> It might even be a wise idea to encode the used Hash-Algorithm into the
> manifest file, so it can be changed later. The hash length might be not
> enough to decide which algorithm is the one used.

I agree. Let's write
SHA256:bc1c3a57369acd0d2183a927fb2e07acbbb1c97f317bbc3b39d93ec65b754af5
or similar rather than just the hash. That way even if the entire SHA
family gets cracked, we can easily substitute in something else that
hasn't been cracked yet.

(It is unclear to me why anyone supposes that *any* popular hash
function won't eventually be cracked. For a K-bit hash function, there
are 2^K possible outputs, where K is probably in the hundreds. But
there are 2^{2^33} possible 1GB files. So for every possible output
value, there are 2^{2^33-K} inputs that produce that value, which is a
very very big number. The probability that any given input produces a
certain output is very low, but the number of possible inputs that
produce a given output is very high; so assuming that nobody's ever
going to figure out how to construct them seems optimistic.)

> To get a feeling one can use:
>
>     openssl speed md5 sha1 sha256 sha512
>
> On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:
>
>   The 'numbers' are in 1000s of bytes per second processed.
>    type       16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes  16384 bytes
>    md5       122638.55k   277023.96k   487725.57k   630806.19k
> 683892.74k   688553.98k
>    sha1      127226.45k   313891.52k   632510.55k   865753.43k
> 960995.33k   977215.19k
>    sha256     77611.02k   173368.15k   325460.99k   412633.43k
> 447022.92k   448020.48k
>    sha512     51164.77k   205189.87k   361345.79k   543883.26k
> 638372.52k   645933.74k
>
> Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
> 427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd
> need a
> pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these
> speeds
> and then you'd use a real CPU for your server, not some poor Intel
> powersaving
> surfing thingy-majingy :)

I mean, how fast is in theory doesn't matter nearly as much as what
happens when you benchmark the proposed implementation, and the
results we have so far don't support the theory that this is so cheap
as to be negligible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Rushabh Lathia
Attachment

Re: backup manifests

From
Robert Haas
Date:
On Wed, Dec 4, 2019 at 1:01 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> As per the  discussion on the thread, here is the patch which
>
> a) Make checksum for manifest file optional.
> b) Allow user to choose a particular algorithm.
>
> Currently with the WIP patch SHA256 and CRC checksum algorithm
> supported.  Patch also changed the manifest file format to append
> the used algorithm name before the checksum, this way it will be
> easy to validator to know which algorithm to used.
>
> Ex:
> ./db/bin/pg_basebackup -D bksha/ --manifest-with-checksums=SHA256
>
> $ cat bksha/backup_manifest  | more
> PostgreSQL-Backup-Manifest-Version 1
> File backup_label 226 2019-12-04 17:46:46 GMT
SHA256:7cf53d1b9facca908678ab70d93a9e7460cd35cedf7891de948dcf858f8a281a
> File pg_xact/0000 8192 2019-12-04 17:46:46 GMT
SHA256:8d2b6cb1dc1a6e8cee763b52d75e73571fddce06eb573861d44082c7d8c03c26
>
> ./db/bin/pg_basebackup -D bkcrc/ --manifest-with-checksums=CRC
> PostgreSQL-Backup-Manifest-Version 1
> File backup_label 226 2019-12-04 17:58:40 GMT CRC:343138313931333134
> File pg_xact/0000 8192 2019-12-04 17:46:46 GMT CRC:363538343433333133
>
> Pending TODOs:
> - Documentation update
> - Code cleanup
> - Testing.
>
> I will further continue to work on the patch and meanwhile feel free to provide
> thoughts/inputs.

+ initilize_manifest_checksum(&cCtx);

Spelling.

-

Spurious.

+ case MC_CRC:
+ INIT_CRC32C(cCtx->crc_ctx);

Suggest that we do CRC -> CRC32C throughout the patch. Someone might
conceivably want some other CRC variant, mostly likely 64-bit, in the
future.

+final_manifest_checksum(ChecksumCtx *cCtx, char *checksumbuf)

finalize

  printf(_("      --manifest-with-checksums\n"
- "                         do calculate checksums for manifest files\n"));
+ "                         calculate checksums for manifest files
using provided algorithm\n"));

Switch name is wrong. Suggest --manifest-checksums.
Help usually shows that an argument is expected, e.g.
--manifest-checksums=ALGORITHM or
--manifest-checksums=sha256|crc32c|none

This seems to apply over some earlier version of the patch.  A
consolidated patch, or the whole stack, would be better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Rushabh Lathia
Date:


On Thu, Dec 5, 2019 at 12:17 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Dec 4, 2019 at 1:01 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> As per the  discussion on the thread, here is the patch which
>
> a) Make checksum for manifest file optional.
> b) Allow user to choose a particular algorithm.
>
> Currently with the WIP patch SHA256 and CRC checksum algorithm
> supported.  Patch also changed the manifest file format to append
> the used algorithm name before the checksum, this way it will be
> easy to validator to know which algorithm to used.
>
> Ex:
> ./db/bin/pg_basebackup -D bksha/ --manifest-with-checksums=SHA256
>
> $ cat bksha/backup_manifest  | more
> PostgreSQL-Backup-Manifest-Version 1
> File backup_label 226 2019-12-04 17:46:46 GMT SHA256:7cf53d1b9facca908678ab70d93a9e7460cd35cedf7891de948dcf858f8a281a
> File pg_xact/0000 8192 2019-12-04 17:46:46 GMT SHA256:8d2b6cb1dc1a6e8cee763b52d75e73571fddce06eb573861d44082c7d8c03c26
>
> ./db/bin/pg_basebackup -D bkcrc/ --manifest-with-checksums=CRC
> PostgreSQL-Backup-Manifest-Version 1
> File backup_label 226 2019-12-04 17:58:40 GMT CRC:343138313931333134
> File pg_xact/0000 8192 2019-12-04 17:46:46 GMT CRC:363538343433333133
>
> Pending TODOs:
> - Documentation update
> - Code cleanup
> - Testing.
>
> I will further continue to work on the patch and meanwhile feel free to provide
> thoughts/inputs.

+ initilize_manifest_checksum(&cCtx);

Spelling.


Fixed.

-

Spurious.

+ case MC_CRC:
+ INIT_CRC32C(cCtx->crc_ctx);

Suggest that we do CRC -> CRC32C throughout the patch. Someone might
conceivably want some other CRC variant, mostly likely 64-bit, in the
future.


Make sense, done.

+final_manifest_checksum(ChecksumCtx *cCtx, char *checksumbuf)

finalize


Done.

  printf(_("      --manifest-with-checksums\n"
- "                         do calculate checksums for manifest files\n"));
+ "                         calculate checksums for manifest files
using provided algorithm\n"));

Switch name is wrong. Suggest --manifest-checksums.
Help usually shows that an argument is expected, e.g.
--manifest-checksums=ALGORITHM or
--manifest-checksums=sha256|crc32c|none


Fixed.

This seems to apply over some earlier version of the patch.  A
consolidated patch, or the whole stack, would be better.

Here is the whole stack of patches.


Thanks,
Rushabh Lathia
Attachment

Re: backup manifests

From
Robert Haas
Date:
On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Here is the whole stack of patches.

Please include proper attribution and, where somebody's written them,
commit messages in each patch in the stack. For example, I see that
your 0001 is mostly the same as my 0001 from upthread, but now it
says:

From a3e075d5edb5031ea358e049f8cb07031fc480a3 Mon Sep 17 00:00:00 2001
From: Rushabh Lathia <rushabh.lathia@enterprisedb.com>
Date: Wed, 13 Nov 2019 15:19:22 +0530
Subject: [PATCH 1/5] Reduce code duplication and eliminate weird macro tricks.

...with no indication of who the original author was.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Here is the whole stack of patches.

I committed 0001, as that's just refactoring and I think (hope) it's
uncontroversial. I think 0002-0005 need to be squashed together
(crediting all authors properly and in the appropriate order) as it's
quite hard to understand right now, and that Suraj's patch to validate
the backup should be included in the patch stack. It needs
documentation. Also, we need, either in that patch or a separate, TAP
tests that exercise this feature. Things we should try to check:

- Plain format backups can be verified against the manifest.
- Tar format backups can be verified against the manifest after
untarring (this might be a problem; not sure there's any guarantee
that we have a working "tar" command available).
- Verification succeeds for all available checksums algorithms and
also for no checksum algorithm (should still check which files are
present, and sizes).
- If we tamper with a backup by removing a file, adding a file, or
changing the size of a file, the modification is detected even without
checksums.
- If we tamper with a backup by changing the contents of a file but
not the size, the modification is detected if checksums are used.
- Everything above still works if there is user-defined tablespace
that contains a table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Rushabh Lathia
Date:


On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Here is the whole stack of patches.

I committed 0001, as that's just refactoring and I think (hope) it's
uncontroversial. I think 0002-0005 need to be squashed together
(crediting all authors properly and in the appropriate order) as it's
quite hard to understand right now,

Please find attached single patch and I tried to add the credit to all
the authors.

There is one review comment from Jeevan Chalke, which still pending
to address is:

4.
Why we need a "File" at the start of each entry as we are adding files only?
I wonder if we also need to provide a tablespace name and directory marker so
that we have "Tablespace" and "Dir" at the start.

Sorry, I am not quite sure about this, may be Robert is right person
to answer this.

and that Suraj's patch to validate
the backup should be included in the patch stack. It needs
documentation. Also, we need, either in that patch or a separate, TAP
tests that exercise this feature. Things we should try to check:

- Plain format backups can be verified against the manifest.
- Tar format backups can be verified against the manifest after
untarring (this might be a problem; not sure there's any guarantee
that we have a working "tar" command available).
- Verification succeeds for all available checksums algorithms and
also for no checksum algorithm (should still check which files are
present, and sizes).
- If we tamper with a backup by removing a file, adding a file, or
changing the size of a file, the modification is detected even without
checksums.
- If we tamper with a backup by changing the contents of a file but
not the size, the modification is detected if checksums are used.
- Everything above still works if there is user-defined tablespace
that contains a table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Thanks.
Rushabh Lathia

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Fri, Dec 6, 2019 at 1:35 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> There is one review comment from Jeevan Chalke, which still pending
> to address is:
>
>> 4.
>> Why we need a "File" at the start of each entry as we are adding files only?
>> I wonder if we also need to provide a tablespace name and directory marker so
>> that we have "Tablespace" and "Dir" at the start.
>
> Sorry, I am not quite sure about this, may be Robert is right person
> to answer this.

I did it that way for extensibility. Notice that the first and last
line of the manifest begin with other words, so someone parsing the
manifest can identify the line type by looking just at the first word.
Someone might in the future find some need to add other kinds of lines
that don't exist today.

"Tablespace" and "Dir" are, in fact, pretty good examples of things
that someone might want to add in the future. I don't really see a
clear need for either one today, although maybe somebody else will,
but I think we should leave ourselves room to add such things in the
future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Jeevan Chalke
Date:


On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Here is the whole stack of patches.

I committed 0001, as that's just refactoring and I think (hope) it's
uncontroversial. I think 0002-0005 need to be squashed together
(crediting all authors properly and in the appropriate order) as it's
quite hard to understand right now,

Please find attached single patch and I tried to add the credit to all
the authors.

I had a look over the patch and here are my few review comments:

1.
+            if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
+                manifest_checksums = MC_SHA256;
+            else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") == 0)
+                manifest_checksums = MC_CRC32C;
+            else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
+                manifest_checksums = MC_NONE;
+            else
+                ereport(ERROR,

Is NONE is a valid input? I think the default is "NONE" only and thus no need
of this as an input. It will be better if we simply error out if input is
neither "SHA256" nor "CRC32C".

I believe you have done this way as from pg_basebackup you are always passing
MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
given. But I think passing that conditional will be better like we have
maxrate_clause for example.

Well, this is what I think, feel free to ignore as I don't see any correctness
issue over here.


2.
+    if (manifest_checksums != MC_NONE)
+    {
+        checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
+        switch (manifest_checksums)
+        {
+            case MC_NONE:
+                break;
+        }

Since switch case is within "if (manifest_checksums != MC_NONE)" condition,
I don't think we need a case for MC_NONE here. Rather we can use a default
case to error out.


3.
+    if (manifest_checksums != MC_NONE)
+    {
+        initialize_manifest_checksum(&cCtx);
+        update_manifest_checksum(&cCtx, content, len);
+    }

@@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
     int            segmentno = 0;
     char       *segmentpath;
     bool        verify_checksum = false;
+    ChecksumCtx cCtx;
+
+    initialize_manifest_checksum(&cCtx);


I see that in a few cases you are calling initialize/update_manifest_checksum()
conditional and at some other places call is unconditional. It seems like
calling unconditional will not have any issues as switch cases inside them
return doing nothing when manifest_checksums is MC_NONE.


4.
initialize/update/finalize_manifest_checksum() functions may be needed by the
validation patch as well. And thus I think these functions should not depend
on a global variable as such. Also, it will be good if we keep them in a file
that is accessible to frontend-only code. Well, you can ignore these comments
with the argument saying that this refactoring can be done by the patch adding
validation support. I have no issues. Since both the patches are dependent and
posted on the same email chain, thought of putting that observation.


5.
+        switch (manifest_checksums)
+        {
+            case MC_SHA256:
+                checksumlabel = "SHA256:";
+                break;
+            case MC_CRC32C:
+                checksumlabel = "CRC32C:";
+                break;
+            case MC_NONE:
+                break;
+        }

This code in AddFileToManifest() is executed for every file for which we are
adding an entry. However, the checksumlabel will be going to remain the same
throughout. Can it be set just once and then used as is?


6.
Can we avoid manifest_checksums from declaring it as a global variable?
I think for that, we need to pass that to every function and thus need to
change the function signature of various functions. Currently, we pass
"StringInfo manifest" to all the required function, will it better to pass
the struct variable instead? A struct may have members like,
"StringInfo manifest" in it, checksum type (manifest_checksums),
checksum label, etc.


Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: backup manifests

From
Rushabh Lathia
Date:

Thanks Jeevan for reviewing the patch and offline discussion.

On Mon, Dec 9, 2019 at 11:15 AM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:


On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Here is the whole stack of patches.

I committed 0001, as that's just refactoring and I think (hope) it's
uncontroversial. I think 0002-0005 need to be squashed together
(crediting all authors properly and in the appropriate order) as it's
quite hard to understand right now,

Please find attached single patch and I tried to add the credit to all
the authors.

I had a look over the patch and here are my few review comments:

1.
+            if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
+                manifest_checksums = MC_SHA256;
+            else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") == 0)
+                manifest_checksums = MC_CRC32C;
+            else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
+                manifest_checksums = MC_NONE;
+            else
+                ereport(ERROR,

Is NONE is a valid input? I think the default is "NONE" only and thus no need
of this as an input. It will be better if we simply error out if input is
neither "SHA256" nor "CRC32C".

I believe you have done this way as from pg_basebackup you are always passing
MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
given. But I think passing that conditional will be better like we have
maxrate_clause for example.

Well, this is what I think, feel free to ignore as I don't see any correctness
issue over here.


I would still keep this NONE as it's look more cleaner in the say of
given options to the checksums.


2.
+    if (manifest_checksums != MC_NONE)
+    {
+        checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
+        switch (manifest_checksums)
+        {
+            case MC_NONE:
+                break;
+        }

Since switch case is within "if (manifest_checksums != MC_NONE)" condition,
I don't think we need a case for MC_NONE here. Rather we can use a default
case to error out.


Yeah, with the new patch we don't have this part of code.


3.
+    if (manifest_checksums != MC_NONE)
+    {
+        initialize_manifest_checksum(&cCtx);
+        update_manifest_checksum(&cCtx, content, len);
+    }

@@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
     int            segmentno = 0;
     char       *segmentpath;
     bool        verify_checksum = false;
+    ChecksumCtx cCtx;
+
+    initialize_manifest_checksum(&cCtx);


I see that in a few cases you are calling initialize/update_manifest_checksum()
conditional and at some other places call is unconditional. It seems like
calling unconditional will not have any issues as switch cases inside them
return doing nothing when manifest_checksums is MC_NONE.


Fixed.


4.
initialize/update/finalize_manifest_checksum() functions may be needed by the
validation patch as well. And thus I think these functions should not depend
on a global variable as such. Also, it will be good if we keep them in a file
that is accessible to frontend-only code. Well, you can ignore these comments
with the argument saying that this refactoring can be done by the patch adding
validation support. I have no issues. Since both the patches are dependent and
posted on the same email chain, thought of putting that observation.


Make sense, I just changed those API to that it doesn't have to
access the global.


5.
+        switch (manifest_checksums)
+        {
+            case MC_SHA256:
+                checksumlabel = "SHA256:";
+                break;
+            case MC_CRC32C:
+                checksumlabel = "CRC32C:";
+                break;
+            case MC_NONE:
+                break;
+        }

This code in AddFileToManifest() is executed for every file for which we are
adding an entry. However, the checksumlabel will be going to remain the same
throughout. Can it be set just once and then used as is?


Yeah, with the attached patch we no more have this part of code.


6.
Can we avoid manifest_checksums from declaring it as a global variable?
I think for that, we need to pass that to every function and thus need to
change the function signature of various functions. Currently, we pass
"StringInfo manifest" to all the required function, will it better to pass
the struct variable instead? A struct may have members like,
"StringInfo manifest" in it, checksum type (manifest_checksums),
checksum label, etc.


I agree.  Earlier I was not sure about this because that require data structure
to expose.  But in the given attached patch that's what I tried, introduced new
data structure and defined in basebackup.h and passed the same through the
function so that doesn't require to pass an individual members.   Also removed
global manifest_checksum and added the same in the newly introduced structure.

Attaching the patch, which need to apply on the top of earlier 0001 patch.

Thanks,

--
Rushabh Lathia
Attachment

Re: backup manifests

From
Rushabh Lathia
Date:


On Mon, Dec 9, 2019 at 2:52 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Thanks Jeevan for reviewing the patch and offline discussion.

On Mon, Dec 9, 2019 at 11:15 AM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:


On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Here is the whole stack of patches.

I committed 0001, as that's just refactoring and I think (hope) it's
uncontroversial. I think 0002-0005 need to be squashed together
(crediting all authors properly and in the appropriate order) as it's
quite hard to understand right now,

Please find attached single patch and I tried to add the credit to all
the authors.

I had a look over the patch and here are my few review comments:

1.
+            if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
+                manifest_checksums = MC_SHA256;
+            else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") == 0)
+                manifest_checksums = MC_CRC32C;
+            else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
+                manifest_checksums = MC_NONE;
+            else
+                ereport(ERROR,

Is NONE is a valid input? I think the default is "NONE" only and thus no need
of this as an input. It will be better if we simply error out if input is
neither "SHA256" nor "CRC32C".

I believe you have done this way as from pg_basebackup you are always passing
MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
given. But I think passing that conditional will be better like we have
maxrate_clause for example.

Well, this is what I think, feel free to ignore as I don't see any correctness
issue over here.


I would still keep this NONE as it's look more cleaner in the say of
given options to the checksums.


2.
+    if (manifest_checksums != MC_NONE)
+    {
+        checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
+        switch (manifest_checksums)
+        {
+            case MC_NONE:
+                break;
+        }

Since switch case is within "if (manifest_checksums != MC_NONE)" condition,
I don't think we need a case for MC_NONE here. Rather we can use a default
case to error out.


Yeah, with the new patch we don't have this part of code.


3.
+    if (manifest_checksums != MC_NONE)
+    {
+        initialize_manifest_checksum(&cCtx);
+        update_manifest_checksum(&cCtx, content, len);
+    }

@@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
     int            segmentno = 0;
     char       *segmentpath;
     bool        verify_checksum = false;
+    ChecksumCtx cCtx;
+
+    initialize_manifest_checksum(&cCtx);


I see that in a few cases you are calling initialize/update_manifest_checksum()
conditional and at some other places call is unconditional. It seems like
calling unconditional will not have any issues as switch cases inside them
return doing nothing when manifest_checksums is MC_NONE.


Fixed.


4.
initialize/update/finalize_manifest_checksum() functions may be needed by the
validation patch as well. And thus I think these functions should not depend
on a global variable as such. Also, it will be good if we keep them in a file
that is accessible to frontend-only code. Well, you can ignore these comments
with the argument saying that this refactoring can be done by the patch adding
validation support. I have no issues. Since both the patches are dependent and
posted on the same email chain, thought of putting that observation.


Make sense, I just changed those API to that it doesn't have to
access the global.


5.
+        switch (manifest_checksums)
+        {
+            case MC_SHA256:
+                checksumlabel = "SHA256:";
+                break;
+            case MC_CRC32C:
+                checksumlabel = "CRC32C:";
+                break;
+            case MC_NONE:
+                break;
+        }

This code in AddFileToManifest() is executed for every file for which we are
adding an entry. However, the checksumlabel will be going to remain the same
throughout. Can it be set just once and then used as is?


Yeah, with the attached patch we no more have this part of code.


6.
Can we avoid manifest_checksums from declaring it as a global variable?
I think for that, we need to pass that to every function and thus need to
change the function signature of various functions. Currently, we pass
"StringInfo manifest" to all the required function, will it better to pass
the struct variable instead? A struct may have members like,
"StringInfo manifest" in it, checksum type (manifest_checksums),
checksum label, etc.


I agree.  Earlier I was not sure about this because that require data structure
to expose.  But in the given attached patch that's what I tried, introduced new
data structure and defined in basebackup.h and passed the same through the
function so that doesn't require to pass an individual members.   Also removed
global manifest_checksum and added the same in the newly introduced structure.

Attaching the patch, which need to apply on the top of earlier 0001 patch.

Attaching another version of 0002 patch, as my collogue Jeevan Chalke pointed
few indentation problem in 0002 patch which I sent earlier.  Fixed the same in
the latest patch.




Thanks,

--
Rushabh Lathia


--
Rushabh Lathia
Attachment

Re: backup manifests

From
Jeevan Chalke
Date:


On Tue, Dec 10, 2019 at 3:29 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Attaching another version of 0002 patch, as my collogue Jeevan Chalke pointed
few indentation problem in 0002 patch which I sent earlier.  Fixed the same in
the latest patch.

I had a look over the new patch and see no issues. Looks good to me.
Thanks for quickly fixing the review comments posted earlier.

However, here are the minor comments:

1.
@@ -122,6 +133,7 @@ static long long int total_checksum_failures;
 /* Do not verify checksums. */
 static bool noverify_checksums = false;
 
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are


Please remove this unnecessary change.

Need to run the indentation.

Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: backup manifests

From
Suraj Kharage
Date:
Hi,

Please find attached patch for backup validator implementation (0004 patch). This patch is based 
on Rushabh's latest patch for backup manifest.

There are some functions required at client side as well, so I have moved those functions
and some data structure at common place so that they can be accessible for both. (0003 patch).

My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap test cases which 
is also attached. As of now, test cases related to the tablespace and tar backup  format are missing, 
will continue work on same and submit the complete patch.

With this mail, I have attached the complete patch stack for backup manifest and backup
validate implementation.

Please let me know your thoughts on the same.

On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Here is the whole stack of patches.

I committed 0001, as that's just refactoring and I think (hope) it's
uncontroversial. I think 0002-0005 need to be squashed together
(crediting all authors properly and in the appropriate order) as it's
quite hard to understand right now, and that Suraj's patch to validate
the backup should be included in the patch stack. It needs
documentation. Also, we need, either in that patch or a separate, TAP
tests that exercise this feature. Things we should try to check:

- Plain format backups can be verified against the manifest.
- Tar format backups can be verified against the manifest after
untarring (this might be a problem; not sure there's any guarantee
that we have a working "tar" command available).
- Verification succeeds for all available checksums algorithms and
also for no checksum algorithm (should still check which files are
present, and sizes).
- If we tamper with a backup by removing a file, adding a file, or
changing the size of a file, the modification is detected even without
checksums.
- If we tamper with a backup by changing the contents of a file but
not the size, the modification is detected if checksums are used.
- Everything above still works if there is user-defined tablespace
that contains a table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Robert Haas
Date:
On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> Please find attached patch for backup validator implementation (0004 patch). This patch is based
> on Rushabh's latest patch for backup manifest.
>
> There are some functions required at client side as well, so I have moved those functions
> and some data structure at common place so that they can be accessible for both. (0003 patch).
>
> My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap test cases which
> is also attached. As of now, test cases related to the tablespace and tar backup  format are missing,
> will continue work on same and submit the complete patch.
>
> With this mail, I have attached the complete patch stack for backup manifest and backup
> validate implementation.
>
> Please let me know your thoughts on the same.

Well, for the second time on this thread, please don't take a bunch of
somebody else's code and post it in a patch that doesn't attribute
that person as one of the authors. For the second time on this thread,
the person is me, but don't borrow *anyone's* code without proper
attribution. It's really important!

On a related note, it's a very good idea to use git format-patch and
git rebase -i to maintain patch stacks like this. Rushabh seems to
have done that, but the files you're posting look like raw 'git diff'
output. Notice that this gives him a way to include authorship
information and a tentative commit message in each patch, but you
don't have any of that.

Also on a related note, part of the process of adapting existing code
to a new purpose is adapting the comments. You haven't done that:

+ * Search a result-set hash table for a row matching a given filename.
...
+ * Insert a row into a result-set hash table, provided no such row is already
...
+ * Most of the values
+ * that we're hashing are short integers formatted as text, so there
+ * shouldn't be much room for pathological input.

I think that what we should actually do here is try to use simplehash.
Right now, it won't work for frontend code, but I posted some patches
to try to address that issue:

https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com

That would have a few advantages. One, we wouldn't need to know the
number of elements in advance, because simplehash can grow
dynamically. Two, we could use the iteration interfaces to walk the
hash table.  Your solution to that is pgrhash_seq_search, but that's
actually not well-designed, because it's not a generic iterator
function but something that knows specifically about the 'touch' flag.
I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
bad, but I think 'matched' will be a little more recognizable.

Please run pgindent. If needed, first add locally defined types to
typedefs.list, so that things indent properly.

It's not a crazy idea to try to share some data structures and code
between the frontend and the backend here, but I think
src/common/backup.c and src/include/common/backup.h is a far too
generic name given what the code is actually doing. It's mostly about
checksums, not backup, and I think it should be named accordingly. I
suggest removing "manifestinfo" and renaming the rest to just talk
about checksums rather than manifests. That would make it logical to
reuse this for any other future code that needs a configurable
checksum type. Also, how about adding a function like:

extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo);

...which would return true and set *algo if name is recognized, and
return false otherwise. That code could be used on both the client and
server sides of this patch, and by any future patches that want to
return this scaffolding.

The file header for backup.h has the wrong filename (string.h). The
header format looks somewhat atypical compared to what we normally do,
too.

It's arguable, but I tend to think that it would be better to
hex-encode the CRC rather than printing it as an integer.  Maybe
hex_encode() is another thing that could be moved into the new
src/common file.

As I said before about Rushabh's patch set, it's very confusing that
we have so many patches here stacked up. Like, you have 0002 moving
stuff, and then 0003 moving it again. That's super-confusing. Please
try to structure the patch set so as to make it as easy to review as
possible.

Regarding the test case patch, error checks are important! Don't do
things like this:

+open my $modify_file_sha256, '>>', "$tempdir/backup_verify/postgresql.conf";
+print $modify_file_sha256 "port = 5555\n";
+close $modify_file_sha256;

If the open fails, then it and the print and the close are going to
silently do nothing. That's bad. I don't know exactly what the
customary error-checking is for things like this in TAP tests, but I
hope it's not like this, because this has a pretty fair chance of
looking like it's testing something that it isn't. Let's figure out
what the best practice in this area is and adhere to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Suraj Kharage
Date:
Thanks, Robert for the review.

On Wed, Dec 11, 2019 at 1:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> Please find attached patch for backup validator implementation (0004 patch). This patch is based
> on Rushabh's latest patch for backup manifest.
>
> There are some functions required at client side as well, so I have moved those functions
> and some data structure at common place so that they can be accessible for both. (0003 patch).
>
> My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap test cases which
> is also attached. As of now, test cases related to the tablespace and tar backup  format are missing,
> will continue work on same and submit the complete patch.
>
> With this mail, I have attached the complete patch stack for backup manifest and backup
> validate implementation.
>
> Please let me know your thoughts on the same.

Well, for the second time on this thread, please don't take a bunch of
somebody else's code and post it in a patch that doesn't attribute
that person as one of the authors. For the second time on this thread,
the person is me, but don't borrow *anyone's* code without proper
attribution. It's really important!

On a related note, it's a very good idea to use git format-patch and
git rebase -i to maintain patch stacks like this. Rushabh seems to
have done that, but the files you're posting look like raw 'git diff'
output. Notice that this gives him a way to include authorship
information and a tentative commit message in each patch, but you
don't have any of that.
 
Sorry, I have corrected this in the attached v2 patch set.
 
Also on a related note, part of the process of adapting existing code
to a new purpose is adapting the comments. You haven't done that:

+ * Search a result-set hash table for a row matching a given filename.
...
+ * Insert a row into a result-set hash table, provided no such row is already
...
+ * Most of the values
+ * that we're hashing are short integers formatted as text, so there
+ * shouldn't be much room for pathological input.
Corrected in v2 patch.
 
I think that what we should actually do here is try to use simplehash.
Right now, it won't work for frontend code, but I posted some patches
to try to address that issue:

https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com

That would have a few advantages. One, we wouldn't need to know the
number of elements in advance, because simplehash can grow
dynamically. Two, we could use the iteration interfaces to walk the
hash table.  Your solution to that is pgrhash_seq_search, but that's
actually not well-designed, because it's not a generic iterator
function but something that knows specifically about the 'touch' flag.
I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
bad, but I think 'matched' will be a little more recognizable.
 
Thanks for the suggestion. Will try to implement the same and update accordingly. 
I am assuming that I need to build the patch based on the changes that you proposed on the mentioned thread.
 
Please run pgindent. If needed, first add locally defined types to
typedefs.list, so that things indent properly.

It's not a crazy idea to try to share some data structures and code
between the frontend and the backend here, but I think
src/common/backup.c and src/include/common/backup.h is a far too
generic name given what the code is actually doing. It's mostly about
checksums, not backup, and I think it should be named accordingly. I
suggest removing "manifestinfo" and renaming the rest to just talk
about checksums rather than manifests. That would make it logical to
reuse this for any other future code that needs a configurable
checksum type. Also, how about adding a function like:

extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo);

...which would return true and set *algo if name is recognized, and
return false otherwise. That code could be used on both the client and
server sides of this patch, and by any future patches that want to
return this scaffolding.
 
Corrected the filename and implemented the function as suggested. 
 
The file header for backup.h has the wrong filename (string.h). The
header format looks somewhat atypical compared to what we normally do,
too.

My bad, corrected the header format as well.
 
 
It's arguable, but I tend to think that it would be better to
hex-encode the CRC rather than printing it as an integer.  Maybe
hex_encode() is another thing that could be moved into the new
src/common file.
 
We are already encoding the CRC checksum as well. Please let me know if I misunderstood anything.
Moved hex_encode into src/common.
 
As I said before about Rushabh's patch set, it's very confusing that
we have so many patches here stacked up. Like, you have 0002 moving
stuff, and then 0003 moving it again. That's super-confusing. Please
try to structure the patch set so as to make it as easy to review as
possible.

Sorry for the confusion. I have squashed 0001 to 0003 patches in one patch.
 
Regarding the test case patch, error checks are important! Don't do
things like this:

+open my $modify_file_sha256, '>>', "$tempdir/backup_verify/postgresql.conf";
+print $modify_file_sha256 "port = 5555\n";
+close $modify_file_sha256;

If the open fails, then it and the print and the close are going to
silently do nothing. That's bad. I don't know exactly what the
customary error-checking is for things like this in TAP tests, but I
hope it's not like this, because this has a pretty fair chance of
looking like it's testing something that it isn't. Let's figure out
what the best practice in this area is and adhere to it.

Rajkumar has fixed this, please find attached 0003 patch for same.

Please find attached v2 set patches.

TODO: will implement the simplehash as suggested.
 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Suraj Kharage
Date:
Hi,
 
I think that what we should actually do here is try to use simplehash.
Right now, it won't work for frontend code, but I posted some patches
to try to address that issue:

https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com

That would have a few advantages. One, we wouldn't need to know the
number of elements in advance, because simplehash can grow
dynamically. Two, we could use the iteration interfaces to walk the
hash table.  Your solution to that is pgrhash_seq_search, but that's
actually not well-designed, because it's not a generic iterator
function but something that knows specifically about the 'touch' flag.
I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
bad, but I think 'matched' will be a little more recognizable.
 
Thanks for the suggestion. Will try to implement the same and update accordingly. 
I am assuming that I need to build the patch based on the changes that you proposed on the mentioned thread.
 

I have implemented the simplehash in backup validator patch as Robert suggested. Please find attached 0002 patch for the same.

kindly review and let me know your thoughts.

Also attached the remaining patches. 0001 and 0003 are same as v2, only patch version is bumped.


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Robert Haas
Date:
On Tue, Dec 17, 2019 at 12:54 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> I have implemented the simplehash in backup validator patch as Robert suggested. Please find attached 0002 patch for
thesame.
 
>
> kindly review and let me know your thoughts.

+#define CHECKSUM_LENGTH 256

This seems wrong. Not all checksums are the same length, and none of
the ones we're using are 256 bytes in length, and if we've got to have
a constant someplace for the maximum checksum length, it should
probably be in the new header file, not here. But I don't think we
should need this in the first place; see comments below about how to
revise the parsing of the manifest file.

+    char        filetype[10];

A mysterious 10-byte field with no comments explaining what it
means... and the same magic number 10 appears in at least one other
place in the patch.

+typedef struct manifesthash_hash *hashtab;

This declares a new *type* called hashtab, not a variable called
hashtab. The new type is not used anywhere, but later, you have
several variables of the same type that have this name. Just remove
this: it's wrong and unused.

+static enum ChecksumAlgorithm checksum_type = MC_NONE;

Remove "enum". Not needed, because you have a typedef for it in the
header, and not per style.

+static  manifesthash_hash *create_manifest_hash(char manifest_path[MAXPGPATH]);

Whitespace is wrong. The whole patch needs a visit from pgindent with
a properly-updated typedefs.list.

Also, you will struggle to find anywhere else in the code base where
pass a character array as a function argument. I don't know why this
isn't just char *.

+    if(verify_backup)

Whitespace wrong here, too.

+ * Read the backup_manifest file and generate the hash table, then scan data
+ * directroy and verify each file. Finally, iterate on hash table to find
+ * out missing files.

You've got a word spelled wrong here, but the bigger problem is that
this comment doesn't actually describe what this function is trying to
do. Instead, it describes how it does it. If it's necessary to explain
what steps the function takes in order to accomplish some goal, you
should comment individual bits of code in the function. The header
comment is a high-level overview, not a description of the algorithm.

It's also pretty unhelpful, here and elsewhere, to refer to "the hash
table" as if there were only one, and as if the reader were supposed
to know something about it when you haven't told them anything about
it.

+        if (!entry->matched)
+        {
+            pg_log_info("missing file: %s", entry->filename);
+        }
+

The braces here are not project style. We usually omit braces when
only a single line of code is present.

I think some work needs to be done to standardize and improve the
messages that get produced here.  You have:

1. missing file: %s
2. duplicate file present: %s
3. size changed for file: %s, original size: %d, current size: %zu
4. checksum difference for file: %s
5. extra file found: %s

I suggest:

1. file \"%s\" is present in manifest but missing from the backup
2. file \"%s\" has multiple manifest entries
(this one should probably be pg_log_error(), not pg_log_info(), as it
represents a corrupt-manifest problem)
3. file \"%s" has size %lu in manifest but size %lu in backup
4. file \"%s" has checksum %s in manifest but checksum %s in backup
5. file \"%s" is present in backup but not in manifest

Your patch actually doesn't compile on my system, because for the
third message above, it uses %zu to print the size. But %zu is for
size_t, not off_t. I went looking for other places in the code where
we print off_t; based on that, I think the right thing to do is to
print it using %lu and write (unsigned long) st.st_size.

+    char        file_checksum[256];
+    char        header[1024];

More arbitrary constants.

+    if (!file)
+    {
+        pg_log_error("could not open backup_manifest");

That's bad error reporting.  See e.g. readfile() in initdb.c.

+    if (fscanf(file, "%1023[^\n]\n", header) != 1)
+    {
+        pg_log_error("error while reading the header from backup_manifest");

That's also bad error reporting. It is only a slight step up from
"ERROR: error".

And we have another magic number (1023).

+    appendPQExpBufferStr(manifest, header);
+    appendPQExpBufferStr(manifest, "\n");
...
+        appendPQExpBuffer(manifest, "File\t%s\t%d\t%s\t%s\n", filename,
+                          filesize, mtime, checksum_with_type);

This whole thing seems completely crazy to me. Basically, you're
trying to use fscanf() to parse the file. But then, because fscanf()
doesn't give you the original bytes back, you're trying to reassemble
the data that you parsed to recover the original line, so that you can
stuff it in the buffer and eventually checksum it. However, that's
highly error-prone. You're basically duplicating server code, and thus
risking getting out of sync in the server code, to work around a
problem that is entirely self-inflicted, namely, deciding to use
fscanf().

What I would recommend is:

1. Use open(), read(), close() rather than the fopen() family of
functions. As we have discovered elsewhere, fread() doesn't promise to
set errno, so we can't necessarily get reliable error-reporting out of
it.

2. Before you start reading the file, create a buffer that's large
enough to hold the whole thing, by using fstat() to figure out how big
the file is. Read the whole file into that buffer.  If you're not able
to read the whole file -- i.e. open() or read() or close() fail --
then just error out and exit.

3. Now advance through the file line by line. Write a function that
knows how to search forward for the next \r or \n but with checks to
make sure it can't run off the end of the buffer, and use that to
locate the end of each line so that you can walk forward. As you walk
forward line by line, add the line you just processed to the checksum.
That way, you only need a single pass over the data. Also, you can
modify it in place.  More on that below.

4. As you examine each line, start by examining the first word. You'll
need a function that finds the first word by searching forward for a
tab character, but not beyond the end of the line. The first word of
the first line should be PostgreSQL-Backup-Manifest-Version and the
second word should be 1. Then on each subsequent line check whether
the first word is File or Manifest-Checksum or something else,
erroring out in the last case. If it's Manifest-Checksum, verify that
this is the last line of the file and that the checksum matches. If
it's File, break the line into fields so you can add it to the hash
table. You'll want a pointer to the filename and a pointer to the
checksum, and you'll want to parse the size as an integer. Instead of
allocating new memory for those fields, just overwrite the character
that follows the field with a \0. There must be one - either \t or \n
- so you shouldn't run off the end of the buffer.

If you do this, a bunch of the fixed-size buffers you have right now
go away. You don't need the variable filetype[10] any more, or
checksum_with_type[CHECKSUM_LENGTH], or checksum[CHECKSUM_LENGTH], or
the character arrays inside DataDirectoryFileInfo. Instead you can
just have pointers into the buffer that contains the file. And you
don't need this code to back up using fseek() and reread the lines,
either.

Also read this article:

https://stackoverflow.com/questions/2430303/disadvantages-of-scanf

Note that the very first point in the article talks about the problem
of overrunning the buffer, which you certainly have in the current
code right here:

+        if (fscanf(file, "%s\t%s\t%d\t%23[^\t] %s\n", filetype, filename,

filetype is declared as char[10], but %s could read arbitrarily much data.

+        filename = (char*) pg_malloc(MAXPGPATH);

pg_malloc returns void *, so no cast is required.

+        if (strcmp(checksum_with_type, "-") == 0)
+        {
+            checksum_type = MC_NONE;
+        }
+        else
+        {
+            if (strncmp(checksum_with_type, "SHA256", 6) == 0)

Use parse_checksum_algorithm. Right now you've invented a "common"
function with 1 caller, but I explicitly suggested previously that you
put it in common so that you could reuse it.

+        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0 ||
+            strcmp(de->d_name, "pg_wal") == 0)
+            continue;

Ignoring pg_wal at the top level might be OK, but this will ignore a
pg_wal entry anywhere in the directory tree.

+    /* Skip backup manifest file. */
+    if (strcmp(de->d_name, "backup_manifest") == 0)
+        return;

Same problem.

+    filename = createPQExpBuffer();
+    if (!filename)
+    {
+        pg_log_error("out of memory");
+        exit(1);
+    }
+
+    appendPQExpBuffer(filename, "%s%s", relative_path, de->d_name);

Just use char filename[MAXPGPATH] and snprintf here, as you do
elsewhere. It will be simpler and save memory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Suraj Kharage
Date:
Thank you for review comments.

On Thu, Dec 19, 2019 at 2:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 17, 2019 at 12:54 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> I have implemented the simplehash in backup validator patch as Robert suggested. Please find attached 0002 patch for the same.
>
> kindly review and let me know your thoughts.

+#define CHECKSUM_LENGTH 256

This seems wrong. Not all checksums are the same length, and none of
the ones we're using are 256 bytes in length, and if we've got to have
a constant someplace for the maximum checksum length, it should
probably be in the new header file, not here. But I don't think we
should need this in the first place; see comments below about how to
revise the parsing of the manifest file.

I agree. Removed.

+    char        filetype[10];

A mysterious 10-byte field with no comments explaining what it
means... and the same magic number 10 appears in at least one other
place in the patch.
 
with current logic, we don't need this anymore. 
I have removed the filetype from the structure as we are not doing any comparison anywhere.
 

+typedef struct manifesthash_hash *hashtab;

This declares a new *type* called hashtab, not a variable called
hashtab. The new type is not used anywhere, but later, you have
several variables of the same type that have this name. Just remove
this: it's wrong and unused.

 
corrected.
 
+static enum ChecksumAlgorithm checksum_type = MC_NONE;

Remove "enum". Not needed, because you have a typedef for it in the
header, and not per style.

corrected.
 
+static  manifesthash_hash *create_manifest_hash(char manifest_path[MAXPGPATH]);

Whitespace is wrong. The whole patch needs a visit from pgindent with
a properly-updated typedefs.list.

Also, you will struggle to find anywhere else in the code base where
pass a character array as a function argument. I don't know why this
isn't just char *.
 
Corrected.
 

+    if(verify_backup)

Whitespace wrong here, too.

 
Fixed
 

It's also pretty unhelpful, here and elsewhere, to refer to "the hash
table" as if there were only one, and as if the reader were supposed
to know something about it when you haven't told them anything about
it.

+        if (!entry->matched)
+        {
+            pg_log_info("missing file: %s", entry->filename);
+        }
+

The braces here are not project style. We usually omit braces when
only a single line of code is present.

fixed
 

I think some work needs to be done to standardize and improve the
messages that get produced here.  You have:

1. missing file: %s
2. duplicate file present: %s
3. size changed for file: %s, original size: %d, current size: %zu
4. checksum difference for file: %s
5. extra file found: %s

I suggest:

1. file \"%s\" is present in manifest but missing from the backup
2. file \"%s\" has multiple manifest entries
(this one should probably be pg_log_error(), not pg_log_info(), as it
represents a corrupt-manifest problem)
3. file \"%s" has size %lu in manifest but size %lu in backup
4. file \"%s" has checksum %s in manifest but checksum %s in backup
5. file \"%s" is present in backup but not in manifest

Corrected.
 

Your patch actually doesn't compile on my system, because for the
third message above, it uses %zu to print the size. But %zu is for
size_t, not off_t. I went looking for other places in the code where
we print off_t; based on that, I think the right thing to do is to
print it using %lu and write (unsigned long) st.st_size.

Corrected. 

+    char        file_checksum[256];
+    char        header[1024];

More arbitrary constants.
 

+    if (!file)
+    {
+        pg_log_error("could not open backup_manifest");

That's bad error reporting.  See e.g. readfile() in initdb.c.

Corrected.
 

+    if (fscanf(file, "%1023[^\n]\n", header) != 1)
+    {
+        pg_log_error("error while reading the header from backup_manifest");

That's also bad error reporting. It is only a slight step up from
"ERROR: error".

And we have another magic number (1023).

With current logic, we don't need this anymore.
 

+    appendPQExpBufferStr(manifest, header);
+    appendPQExpBufferStr(manifest, "\n");
...
+        appendPQExpBuffer(manifest, "File\t%s\t%d\t%s\t%s\n", filename,
+                          filesize, mtime, checksum_with_type);

This whole thing seems completely crazy to me. Basically, you're
trying to use fscanf() to parse the file. But then, because fscanf()
doesn't give you the original bytes back, you're trying to reassemble
the data that you parsed to recover the original line, so that you can
stuff it in the buffer and eventually checksum it. However, that's
highly error-prone. You're basically duplicating server code, and thus
risking getting out of sync in the server code, to work around a
problem that is entirely self-inflicted, namely, deciding to use
fscanf().

What I would recommend is:

1. Use open(), read(), close() rather than the fopen() family of
functions. As we have discovered elsewhere, fread() doesn't promise to
set errno, so we can't necessarily get reliable error-reporting out of
it.

2. Before you start reading the file, create a buffer that's large
enough to hold the whole thing, by using fstat() to figure out how big
the file is. Read the whole file into that buffer.  If you're not able
to read the whole file -- i.e. open() or read() or close() fail --
then just error out and exit.

3. Now advance through the file line by line. Write a function that
knows how to search forward for the next \r or \n but with checks to
make sure it can't run off the end of the buffer, and use that to
locate the end of each line so that you can walk forward. As you walk
forward line by line, add the line you just processed to the checksum.
That way, you only need a single pass over the data. Also, you can
modify it in place.  More on that below.

4. As you examine each line, start by examining the first word. You'll
need a function that finds the first word by searching forward for a
tab character, but not beyond the end of the line. The first word of
the first line should be PostgreSQL-Backup-Manifest-Version and the
second word should be 1. Then on each subsequent line check whether
the first word is File or Manifest-Checksum or something else,
erroring out in the last case. If it's Manifest-Checksum, verify that
this is the last line of the file and that the checksum matches. If
it's File, break the line into fields so you can add it to the hash
table. You'll want a pointer to the filename and a pointer to the
checksum, and you'll want to parse the size as an integer. Instead of
allocating new memory for those fields, just overwrite the character
that follows the field with a \0. There must be one - either \t or \n
- so you shouldn't run off the end of the buffer.

If you do this, a bunch of the fixed-size buffers you have right now
go away. You don't need the variable filetype[10] any more, or
checksum_with_type[CHECKSUM_LENGTH], or checksum[CHECKSUM_LENGTH], or
the character arrays inside DataDirectoryFileInfo. Instead you can
just have pointers into the buffer that contains the file. And you
don't need this code to back up using fseek() and reread the lines,
either.


Thanks for the suggestion. I tried to mimic your approach in the attached v4-0002 patch. 
Please let me know your thoughts on the same.

Also read this article:

https://stackoverflow.com/questions/2430303/disadvantages-of-scanf

Note that the very first point in the article talks about the problem
of overrunning the buffer, which you certainly have in the current
code right here:

+        if (fscanf(file, "%s\t%s\t%d\t%23[^\t] %s\n", filetype, filename,

filetype is declared as char[10], but %s could read arbitrarily much data.
 
now with this revised logic, we don't use this anymore.
 

+        filename = (char*) pg_malloc(MAXPGPATH);

pg_malloc returns void *, so no cast is required.


fixed.
 
+        if (strcmp(checksum_with_type, "-") == 0)
+        {
+            checksum_type = MC_NONE;
+        }
+        else
+        {
+            if (strncmp(checksum_with_type, "SHA256", 6) == 0)

Use parse_checksum_algorithm. Right now you've invented a "common"
function with 1 caller, but I explicitly suggested previously that you
put it in common so that you could reuse it.

while parsing the record, we get <checktype>:<checksum> as a string for checksum. 
parse_checksum_algorithm uses pg_strcasecmp()  so we need to pass exact string to that function.
with current logic, we can't add '\0' in between the line unless we parse it completely. 
So we may need to allocate another small buffer and copy only checksum type in that and pass that to 
 parse_checksum_algorithm.  I don't think of any other solution apart from this. I might be missing something
here, please correct me if I am wrong.


+        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0 ||
+            strcmp(de->d_name, "pg_wal") == 0)
+            continue;

Ignoring pg_wal at the top level might be OK, but this will ignore a
pg_wal entry anywhere in the directory tree.

+    /* Skip backup manifest file. */
+    if (strcmp(de->d_name, "backup_manifest") == 0)
+        return;

Same problem.

You are right. Added extra check for this.
 

+    filename = createPQExpBuffer();
+    if (!filename)
+    {
+        pg_log_error("out of memory");
+        exit(1);
+    }
+
+    appendPQExpBuffer(filename, "%s%s", relative_path, de->d_name);

Just use char filename[MAXPGPATH] and snprintf here, as you do
elsewhere. It will be simpler and save memory.
Fixed.
 
TAP test case patch needs some modification, Will do that and submit.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Suraj Kharage
Date:
Fixed some typos in attached v5-0002 patch. Please consider this patch for review.

On Fri, Dec 20, 2019 at 6:54 PM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
Thank you for review comments.

On Thu, Dec 19, 2019 at 2:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 17, 2019 at 12:54 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> I have implemented the simplehash in backup validator patch as Robert suggested. Please find attached 0002 patch for the same.
>
> kindly review and let me know your thoughts.

+#define CHECKSUM_LENGTH 256

This seems wrong. Not all checksums are the same length, and none of
the ones we're using are 256 bytes in length, and if we've got to have
a constant someplace for the maximum checksum length, it should
probably be in the new header file, not here. But I don't think we
should need this in the first place; see comments below about how to
revise the parsing of the manifest file.

I agree. Removed.

+    char        filetype[10];

A mysterious 10-byte field with no comments explaining what it
means... and the same magic number 10 appears in at least one other
place in the patch.
 
with current logic, we don't need this anymore. 
I have removed the filetype from the structure as we are not doing any comparison anywhere.
 

+typedef struct manifesthash_hash *hashtab;

This declares a new *type* called hashtab, not a variable called
hashtab. The new type is not used anywhere, but later, you have
several variables of the same type that have this name. Just remove
this: it's wrong and unused.

 
corrected.
 
+static enum ChecksumAlgorithm checksum_type = MC_NONE;

Remove "enum". Not needed, because you have a typedef for it in the
header, and not per style.

corrected.
 
+static  manifesthash_hash *create_manifest_hash(char manifest_path[MAXPGPATH]);

Whitespace is wrong. The whole patch needs a visit from pgindent with
a properly-updated typedefs.list.

Also, you will struggle to find anywhere else in the code base where
pass a character array as a function argument. I don't know why this
isn't just char *.
 
Corrected.
 

+    if(verify_backup)

Whitespace wrong here, too.

 
Fixed
 

It's also pretty unhelpful, here and elsewhere, to refer to "the hash
table" as if there were only one, and as if the reader were supposed
to know something about it when you haven't told them anything about
it.

+        if (!entry->matched)
+        {
+            pg_log_info("missing file: %s", entry->filename);
+        }
+

The braces here are not project style. We usually omit braces when
only a single line of code is present.

fixed
 

I think some work needs to be done to standardize and improve the
messages that get produced here.  You have:

1. missing file: %s
2. duplicate file present: %s
3. size changed for file: %s, original size: %d, current size: %zu
4. checksum difference for file: %s
5. extra file found: %s

I suggest:

1. file \"%s\" is present in manifest but missing from the backup
2. file \"%s\" has multiple manifest entries
(this one should probably be pg_log_error(), not pg_log_info(), as it
represents a corrupt-manifest problem)
3. file \"%s" has size %lu in manifest but size %lu in backup
4. file \"%s" has checksum %s in manifest but checksum %s in backup
5. file \"%s" is present in backup but not in manifest

Corrected.
 

Your patch actually doesn't compile on my system, because for the
third message above, it uses %zu to print the size. But %zu is for
size_t, not off_t. I went looking for other places in the code where
we print off_t; based on that, I think the right thing to do is to
print it using %lu and write (unsigned long) st.st_size.

Corrected. 

+    char        file_checksum[256];
+    char        header[1024];

More arbitrary constants.
 

+    if (!file)
+    {
+        pg_log_error("could not open backup_manifest");

That's bad error reporting.  See e.g. readfile() in initdb.c.

Corrected.
 

+    if (fscanf(file, "%1023[^\n]\n", header) != 1)
+    {
+        pg_log_error("error while reading the header from backup_manifest");

That's also bad error reporting. It is only a slight step up from
"ERROR: error".

And we have another magic number (1023).

With current logic, we don't need this anymore.
 

+    appendPQExpBufferStr(manifest, header);
+    appendPQExpBufferStr(manifest, "\n");
...
+        appendPQExpBuffer(manifest, "File\t%s\t%d\t%s\t%s\n", filename,
+                          filesize, mtime, checksum_with_type);

This whole thing seems completely crazy to me. Basically, you're
trying to use fscanf() to parse the file. But then, because fscanf()
doesn't give you the original bytes back, you're trying to reassemble
the data that you parsed to recover the original line, so that you can
stuff it in the buffer and eventually checksum it. However, that's
highly error-prone. You're basically duplicating server code, and thus
risking getting out of sync in the server code, to work around a
problem that is entirely self-inflicted, namely, deciding to use
fscanf().

What I would recommend is:

1. Use open(), read(), close() rather than the fopen() family of
functions. As we have discovered elsewhere, fread() doesn't promise to
set errno, so we can't necessarily get reliable error-reporting out of
it.

2. Before you start reading the file, create a buffer that's large
enough to hold the whole thing, by using fstat() to figure out how big
the file is. Read the whole file into that buffer.  If you're not able
to read the whole file -- i.e. open() or read() or close() fail --
then just error out and exit.

3. Now advance through the file line by line. Write a function that
knows how to search forward for the next \r or \n but with checks to
make sure it can't run off the end of the buffer, and use that to
locate the end of each line so that you can walk forward. As you walk
forward line by line, add the line you just processed to the checksum.
That way, you only need a single pass over the data. Also, you can
modify it in place.  More on that below.

4. As you examine each line, start by examining the first word. You'll
need a function that finds the first word by searching forward for a
tab character, but not beyond the end of the line. The first word of
the first line should be PostgreSQL-Backup-Manifest-Version and the
second word should be 1. Then on each subsequent line check whether
the first word is File or Manifest-Checksum or something else,
erroring out in the last case. If it's Manifest-Checksum, verify that
this is the last line of the file and that the checksum matches. If
it's File, break the line into fields so you can add it to the hash
table. You'll want a pointer to the filename and a pointer to the
checksum, and you'll want to parse the size as an integer. Instead of
allocating new memory for those fields, just overwrite the character
that follows the field with a \0. There must be one - either \t or \n
- so you shouldn't run off the end of the buffer.

If you do this, a bunch of the fixed-size buffers you have right now
go away. You don't need the variable filetype[10] any more, or
checksum_with_type[CHECKSUM_LENGTH], or checksum[CHECKSUM_LENGTH], or
the character arrays inside DataDirectoryFileInfo. Instead you can
just have pointers into the buffer that contains the file. And you
don't need this code to back up using fseek() and reread the lines,
either.


Thanks for the suggestion. I tried to mimic your approach in the attached v4-0002 patch. 
Please let me know your thoughts on the same.

Also read this article:

https://stackoverflow.com/questions/2430303/disadvantages-of-scanf

Note that the very first point in the article talks about the problem
of overrunning the buffer, which you certainly have in the current
code right here:

+        if (fscanf(file, "%s\t%s\t%d\t%23[^\t] %s\n", filetype, filename,

filetype is declared as char[10], but %s could read arbitrarily much data.
 
now with this revised logic, we don't use this anymore.
 

+        filename = (char*) pg_malloc(MAXPGPATH);

pg_malloc returns void *, so no cast is required.


fixed.
 
+        if (strcmp(checksum_with_type, "-") == 0)
+        {
+            checksum_type = MC_NONE;
+        }
+        else
+        {
+            if (strncmp(checksum_with_type, "SHA256", 6) == 0)

Use parse_checksum_algorithm. Right now you've invented a "common"
function with 1 caller, but I explicitly suggested previously that you
put it in common so that you could reuse it.

while parsing the record, we get <checktype>:<checksum> as a string for checksum. 
parse_checksum_algorithm uses pg_strcasecmp()  so we need to pass exact string to that function.
with current logic, we can't add '\0' in between the line unless we parse it completely. 
So we may need to allocate another small buffer and copy only checksum type in that and pass that to 
 parse_checksum_algorithm.  I don't think of any other solution apart from this. I might be missing something
here, please correct me if I am wrong.


+        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0 ||
+            strcmp(de->d_name, "pg_wal") == 0)
+            continue;

Ignoring pg_wal at the top level might be OK, but this will ignore a
pg_wal entry anywhere in the directory tree.

+    /* Skip backup manifest file. */
+    if (strcmp(de->d_name, "backup_manifest") == 0)
+        return;

Same problem.

You are right. Added extra check for this.
 

+    filename = createPQExpBuffer();
+    if (!filename)
+    {
+        pg_log_error("out of memory");
+        exit(1);
+    }
+
+    appendPQExpBuffer(filename, "%s%s", relative_path, de->d_name);

Just use char filename[MAXPGPATH] and snprintf here, as you do
elsewhere. It will be simpler and save memory.
Fixed.
 
TAP test case patch needs some modification, Will do that and submit.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Robert Haas
Date:
On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> Thank you for review comments.

Thanks for the new version.

+      <term><option>--verify-backup </option></term>

Whitespace.

+struct manifesthash_hash *hashtab;

Uh, I had it in mind that you would nuke this line completely, not
just remove "typedef" from it. You shouldn't need a global variable
here.

+ if (buf == NULL)

pg_malloc seems to have an internal check such that it never returns
NULL. I don't see anything like this test in other callers.

The order of operations in create_manifest_hash() seems unusual:

+ fd = open(manifest_path, O_RDONLY, 0);
+ if (fstat(fd, &stat))
+ buf = pg_malloc(stat.st_size);
+ hashtab = manifesthash_create(1024, NULL);
...
+ entry = manifesthash_insert(hashtab, filename, &found);
...
+ close(fd);

I would have expected open-fstat-read-close to be consecutive, and the
manifesthash stuff all done afterwards. In fact, it seems like reading
the file could be a separate function.

+ if (strncmp(checksum, "SHA256", 6) == 0)

This isn't really right; it would give a false match if we had a
checksum algorithm with a name like SHA2560 or SHA256C or
SHA256ExceptWayBetter. The right thing to do is find the colon first,
and then probably overwrite it with '\0' so that you have a string
that you can pass to parse_checksum_algorithm().

+ /*
+ * we don't have checksum type in the header, so need to
+ * read through the first file enttry to find the checksum
+ * type for the manifest file and initilize the checksum
+ * for the manifest file itself.
+ */

This seems to be proceeding on the assumption that the checksum type
for the manifest itself will always be the same as the checksum type
for the first file in the manifest. I don't think that's the right
approach. I think the manifest should always have a SHA256 checksum,
regardless of what type of checksum is used for the individual files
within the manifest. Since the volume of data in the manifest is
presumably very small compared to the size of the database cluster
itself, I don't think there should be any performance problem there.

+ filesize = atol(size);

Using strtol() would allow for some error checking.

+ * Increase the checksum by its lable length so that we can
+ checksum = checksum + checksum_lable_length;

Spelling.

+ pg_log_error("invalid record found in \"%s\"", manifest_path);

Error message needs work.

+VerifyBackup(void)
+create_manifest_hash(char *manifest_path)
+nextLine(char *buf)

Your function names should be consistent with the surrounding style,
and with each other, as far as possible. Three different conventions
within the same patch and source file seems over the top.

Also keep in mind that you're not writing code in a vacuum. There's a
whole file of code here, and around that, a whole project.
scan_data_directory() is a good example of a function whose name is
clearly too generic. It's not a general-purpose function for scanning
the data directory; it's specifically a support function for verifying
a backup. Yet, the name gives no hint of this.

+verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
+ char relative_path[MAXPGPATH], manifesthash_hash *hashtab)

I think I commented on the use of char[] parameters in my previous review.

+ /* Skip backup manifest file. */
+ if (strcmp(de->d_name, "backup_manifest") == 0)
+ return;

Still looks like this will be skipped at any level of the directory
hierarchy, not just the top. And why are we skipping backup_manifest
here bug pg_wal in scan_data_directory? That's a rhetorical question,
because I know the answer: verify_file() is only getting called for
files, so you can't use it to skip directories. But that's not a good
excuse for putting closely-related checks in different parts of the
code. It's just going to result in the checks being inconsistent and
each one having its own bugs that have to be fixed separately from the
other one, as here. Please try to reorganize this code so that it can
be done in a consistent way.

I think this is related to the way you're traversing the directory
tree, which somehow looks a bit awkward to me. At the top of
scan_data_directory(), you've got code that uses basedir and
subdirpath to construct path and relative_path. I was initially
surprised to see that this was the job of this function, rather than
the caller, but then I thought: well, as long as it makes life easy
for the caller, it's probably fine. However, I notice that the only
non-trivial caller is the scan_data_directory() itself, and it has to
go and construct newsubdirpath from subdirpath and the directory name.

It seems to me that this would get easier if you defined
scan_data_directory() -- or whatever we end up calling it -- to take
two pathname-related arguments:

- basepath, which would be $PGDATA and would never change as we
recurse down, so same as what you're now calling basedir
- pathsuffix, which would be an empty string at the top level and at
each recursive level we'd add a slash and then de->d_name.

So at the top of the function we wouldn't need an if statement,
because you could just do:

snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix);

And when you recurse you wouldn't need an if statement either, because
you could just do:

snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name);

What I'd suggest is constructing newpathsuffix right after rejecting
"." and ".." entries, and then you can reject both pg_wal and
backup_manifest, at the top-level only, using symmetric and elegant
code:

if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
"/backup_manifest") == 0)
    continue;

+ record = manifesthash_lookup(hashtab, filename);;
+ if (record)
+ {
...long block...
+ }
+ else
+ pg_log_info("file \"%s\" is present in backup but not in manifest",
+ filename);

Try to structure the code in such a way that you minimize unnecessary
indentation. For example, in this case, you could instead write:

if (record == NULL)
{
    pg_log_info(...)
    return;
}

and the result would be that everything inside that long if-block is
now at the top level of the function and indented one level less. And
I think if you look at this function you'll see a way that you can
save a *second* level of indentation for much of that code. Please
check the rest of the patch for similar cases, too.

+static char *
+nextLine(char *buf)
+{
+ while (*buf != '\0' && *buf != '\n')
+ buf = buf + 1;
+
+ return buf + 1;
+}

I'm pretty sure that my previous review mentioned the importance of
protecting against buffer overruns here.

+static char *
+nextWord(char *line)
+{
+ while (*line != '\0' && *line != '\t' && *line != '\n')
+ line = line + 1;
+
+ return line + 1;
+}

Same problem here.

In both cases, ++ is more idiomatic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Rushabh Lathia
Date:


On Fri, Dec 20, 2019 at 9:14 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> Thank you for review comments.

Thanks for the new version.

+      <term><option>--verify-backup </option></term>

Whitespace.

+struct manifesthash_hash *hashtab;

Uh, I had it in mind that you would nuke this line completely, not
just remove "typedef" from it. You shouldn't need a global variable
here.

+ if (buf == NULL)

pg_malloc seems to have an internal check such that it never returns
NULL. I don't see anything like this test in other callers.

The order of operations in create_manifest_hash() seems unusual:

+ fd = open(manifest_path, O_RDONLY, 0);
+ if (fstat(fd, &stat))
+ buf = pg_malloc(stat.st_size);
+ hashtab = manifesthash_create(1024, NULL);
...
+ entry = manifesthash_insert(hashtab, filename, &found);
...
+ close(fd);

I would have expected open-fstat-read-close to be consecutive, and the
manifesthash stuff all done afterwards. In fact, it seems like reading
the file could be a separate function.

+ if (strncmp(checksum, "SHA256", 6) == 0)

This isn't really right; it would give a false match if we had a
checksum algorithm with a name like SHA2560 or SHA256C or
SHA256ExceptWayBetter. The right thing to do is find the colon first,
and then probably overwrite it with '\0' so that you have a string
that you can pass to parse_checksum_algorithm().

+ /*
+ * we don't have checksum type in the header, so need to
+ * read through the first file enttry to find the checksum
+ * type for the manifest file and initilize the checksum
+ * for the manifest file itself.
+ */

This seems to be proceeding on the assumption that the checksum type
for the manifest itself will always be the same as the checksum type
for the first file in the manifest. I don't think that's the right
approach. I think the manifest should always have a SHA256 checksum,
regardless of what type of checksum is used for the individual files
within the manifest. Since the volume of data in the manifest is
presumably very small compared to the size of the database cluster
itself, I don't think there should be any performance problem there.

Agree, that performance won't be a problem, but that will be bit confusing
to the user.  As at the start user providing the manifest-checksum (assume
that user-provided CRC32C) and at the end, user will find the SHA256
checksum string in the backup_manifest file.  

Does this also means that irrespective of whether user provided a checksum
option or not,  we will be always generating the checksum for the backup_manifest file?


+ filesize = atol(size);

Using strtol() would allow for some error checking.

+ * Increase the checksum by its lable length so that we can
+ checksum = checksum + checksum_lable_length;

Spelling.

+ pg_log_error("invalid record found in \"%s\"", manifest_path);

Error message needs work.

+VerifyBackup(void)
+create_manifest_hash(char *manifest_path)
+nextLine(char *buf)

Your function names should be consistent with the surrounding style,
and with each other, as far as possible. Three different conventions
within the same patch and source file seems over the top.

Also keep in mind that you're not writing code in a vacuum. There's a
whole file of code here, and around that, a whole project.
scan_data_directory() is a good example of a function whose name is
clearly too generic. It's not a general-purpose function for scanning
the data directory; it's specifically a support function for verifying
a backup. Yet, the name gives no hint of this.

+verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
+ char relative_path[MAXPGPATH], manifesthash_hash *hashtab)

I think I commented on the use of char[] parameters in my previous review.

+ /* Skip backup manifest file. */
+ if (strcmp(de->d_name, "backup_manifest") == 0)
+ return;

Still looks like this will be skipped at any level of the directory
hierarchy, not just the top. And why are we skipping backup_manifest
here bug pg_wal in scan_data_directory? That's a rhetorical question,
because I know the answer: verify_file() is only getting called for
files, so you can't use it to skip directories. But that's not a good
excuse for putting closely-related checks in different parts of the
code. It's just going to result in the checks being inconsistent and
each one having its own bugs that have to be fixed separately from the
other one, as here. Please try to reorganize this code so that it can
be done in a consistent way.

I think this is related to the way you're traversing the directory
tree, which somehow looks a bit awkward to me. At the top of
scan_data_directory(), you've got code that uses basedir and
subdirpath to construct path and relative_path. I was initially
surprised to see that this was the job of this function, rather than
the caller, but then I thought: well, as long as it makes life easy
for the caller, it's probably fine. However, I notice that the only
non-trivial caller is the scan_data_directory() itself, and it has to
go and construct newsubdirpath from subdirpath and the directory name.

It seems to me that this would get easier if you defined
scan_data_directory() -- or whatever we end up calling it -- to take
two pathname-related arguments:

- basepath, which would be $PGDATA and would never change as we
recurse down, so same as what you're now calling basedir
- pathsuffix, which would be an empty string at the top level and at
each recursive level we'd add a slash and then de->d_name.

So at the top of the function we wouldn't need an if statement,
because you could just do:

snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix);

And when you recurse you wouldn't need an if statement either, because
you could just do:

snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name);

What I'd suggest is constructing newpathsuffix right after rejecting
"." and ".." entries, and then you can reject both pg_wal and
backup_manifest, at the top-level only, using symmetric and elegant
code:

if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
"/backup_manifest") == 0)
    continue;

+ record = manifesthash_lookup(hashtab, filename);;
+ if (record)
+ {
...long block...
+ }
+ else
+ pg_log_info("file \"%s\" is present in backup but not in manifest",
+ filename);

Try to structure the code in such a way that you minimize unnecessary
indentation. For example, in this case, you could instead write:

if (record == NULL)
{
    pg_log_info(...)
    return;
}

and the result would be that everything inside that long if-block is
now at the top level of the function and indented one level less. And
I think if you look at this function you'll see a way that you can
save a *second* level of indentation for much of that code. Please
check the rest of the patch for similar cases, too.

+static char *
+nextLine(char *buf)
+{
+ while (*buf != '\0' && *buf != '\n')
+ buf = buf + 1;
+
+ return buf + 1;
+}

I'm pretty sure that my previous review mentioned the importance of
protecting against buffer overruns here.

+static char *
+nextWord(char *line)
+{
+ while (*line != '\0' && *line != '\t' && *line != '\n')
+ line = line + 1;
+
+ return line + 1;
+}

Same problem here.

In both cases, ++ is more idiomatic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Rushabh Lathia

Re: backup manifests

From
Robert Haas
Date:
On Sun, Dec 22, 2019 at 8:32 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Agree, that performance won't be a problem, but that will be bit confusing
> to the user.  As at the start user providing the manifest-checksum (assume
> that user-provided CRC32C) and at the end, user will find the SHA256
> checksum string in the backup_manifest file.

I don't think that's particularly confusing. The documentation should
say that this is the algorithm to be used for checksumming the files
which are backed up. The algorithm to be used for the manifest itself
is another matter. To me, it seems far MORE confusing if the algorithm
used for the manifest itself is magically inferred from the algorithm
used for one of the File lines therein.

> Does this also means that irrespective of whether user provided a checksum
> option or not,  we will be always generating the checksum for the backup_manifest file?

Yes, that is what I am proposing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Suraj Kharage
Date:
Thank you for review comments.

On Fri, Dec 20, 2019 at 9:14 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> Thank you for review comments.

Thanks for the new version.

+      <term><option>--verify-backup </option></term>

Whitespace.
Corrected.
 

+struct manifesthash_hash *hashtab;

Uh, I had it in mind that you would nuke this line completely, not
just remove "typedef" from it. You shouldn't need a global variable
here.
 
Removed.
 
+ if (buf == NULL)

pg_malloc seems to have an internal check such that it never returns
NULL. I don't see anything like this test in other callers.
 
Yeah, removed this check
 

The order of operations in create_manifest_hash() seems unusual:

+ fd = open(manifest_path, O_RDONLY, 0);
+ if (fstat(fd, &stat))
+ buf = pg_malloc(stat.st_size);
+ hashtab = manifesthash_create(1024, NULL);
...
+ entry = manifesthash_insert(hashtab, filename, &found);
...
+ close(fd);

I would have expected open-fstat-read-close to be consecutive, and the
manifesthash stuff all done afterwards. In fact, it seems like reading
the file could be a separate function.
 
Yes, created new function which will read the file and return the buffer.
 

+ if (strncmp(checksum, "SHA256", 6) == 0)

This isn't really right; it would give a false match if we had a
checksum algorithm with a name like SHA2560 or SHA256C or
SHA256ExceptWayBetter. The right thing to do is find the colon first,
and then probably overwrite it with '\0' so that you have a string
that you can pass to parse_checksum_algorithm().
 
Corrected this check. Below suggestion, allow us to put '\0' in between the line.
since SHA256 is used to generate for backup manifest, so that we can feed that 
line early to the checksum machinery.
 

+ /*
+ * we don't have checksum type in the header, so need to
+ * read through the first file enttry to find the checksum
+ * type for the manifest file and initilize the checksum
+ * for the manifest file itself.
+ */

This seems to be proceeding on the assumption that the checksum type
for the manifest itself will always be the same as the checksum type
for the first file in the manifest. I don't think that's the right
approach. I think the manifest should always have a SHA256 checksum,
regardless of what type of checksum is used for the individual files
within the manifest. Since the volume of data in the manifest is
presumably very small compared to the size of the database cluster
itself, I don't think there should be any performance problem there.
Made the change in backup manifest as well in backup validatort patch. Thanks to Rushabh Lathia for the offline discussion and help.

To examine the first word of each line, I am using below check:
if (strncmp(line, "File", 4) == 0)
{
..
}
else if (strncmp(line, "Manifest-Checksum", 17) == 0)
{
..

else
    error
 
strncmp might be not right here, but we can not put '\0' in between the line (to find out first word) 
before we recognize the line type. 
All the lines expect line last one (where we have manifest checksum) are feed to the checksum machinary to calculate manifest checksum. 
so update_checksum() should be called after recognizing the type, i.e: if it is a File type record. Do you see any issues with this?

+ filesize = atol(size);

Using strtol() would allow for some error checking.
corrected.
 

+ * Increase the checksum by its lable length so that we can
+ checksum = checksum + checksum_lable_length;

Spelling.
corrected.
 

+ pg_log_error("invalid record found in \"%s\"", manifest_path);

Error message needs work.

+VerifyBackup(void)
+create_manifest_hash(char *manifest_path)
+nextLine(char *buf)

Your function names should be consistent with the surrounding style,
and with each other, as far as possible. Three different conventions
within the same patch and source file seems over the top.

Also keep in mind that you're not writing code in a vacuum. There's a
whole file of code here, and around that, a whole project.
scan_data_directory() is a good example of a function whose name is
clearly too generic. It's not a general-purpose function for scanning
the data directory; it's specifically a support function for verifying
a backup. Yet, the name gives no hint of this.

+verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
+ char relative_path[MAXPGPATH], manifesthash_hash *hashtab)

I think I commented on the use of char[] parameters in my previous review.

+ /* Skip backup manifest file. */
+ if (strcmp(de->d_name, "backup_manifest") == 0)
+ return;

Still looks like this will be skipped at any level of the directory
hierarchy, not just the top. And why are we skipping backup_manifest
here bug pg_wal in scan_data_directory? That's a rhetorical question,
because I know the answer: verify_file() is only getting called for
files, so you can't use it to skip directories. But that's not a good
excuse for putting closely-related checks in different parts of the
code. It's just going to result in the checks being inconsistent and
each one having its own bugs that have to be fixed separately from the
other one, as here. Please try to reorganize this code so that it can
be done in a consistent way.

I think this is related to the way you're traversing the directory
tree, which somehow looks a bit awkward to me. At the top of
scan_data_directory(), you've got code that uses basedir and
subdirpath to construct path and relative_path. I was initially
surprised to see that this was the job of this function, rather than
the caller, but then I thought: well, as long as it makes life easy
for the caller, it's probably fine. However, I notice that the only
non-trivial caller is the scan_data_directory() itself, and it has to
go and construct newsubdirpath from subdirpath and the directory name.

It seems to me that this would get easier if you defined
scan_data_directory() -- or whatever we end up calling it -- to take
two pathname-related arguments:

- basepath, which would be $PGDATA and would never change as we
recurse down, so same as what you're now calling basedir
- pathsuffix, which would be an empty string at the top level and at
each recursive level we'd add a slash and then de->d_name.

So at the top of the function we wouldn't need an if statement,
because you could just do:

snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix);

And when you recurse you wouldn't need an if statement either, because
you could just do:

snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name);

What I'd suggest is constructing newpathsuffix right after rejecting
"." and ".." entries, and then you can reject both pg_wal and
backup_manifest, at the top-level only, using symmetric and elegant
code:

if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
"/backup_manifest") == 0)
    continue;

Thanks for the suggestion. Corrected as per the above inputs. 
 
+ record = manifesthash_lookup(hashtab, filename);;
+ if (record)
+ {
...long block...
+ }
+ else
+ pg_log_info("file \"%s\" is present in backup but not in manifest",
+ filename);

Try to structure the code in such a way that you minimize unnecessary
indentation. For example, in this case, you could instead write:

if (record == NULL)
{
    pg_log_info(...)
    return;
}

and the result would be that everything inside that long if-block is
now at the top level of the function and indented one level less. And
I think if you look at this function you'll see a way that you can
save a *second* level of indentation for much of that code. Please
check the rest of the patch for similar cases, too.
 
Make sense. corrected.
 

+static char *
+nextLine(char *buf)
+{
+ while (*buf != '\0' && *buf != '\n')
+ buf = buf + 1;
+
+ return buf + 1;
+}

I'm pretty sure that my previous review mentioned the importance of
protecting against buffer overruns here.

+static char *
+nextWord(char *line)
+{
+ while (*line != '\0' && *line != '\t' && *line != '\n')
+ line = line + 1;
+
+ return line + 1;
+}

Same problem here.

In both cases, ++ is more idiomatic.
I have added a check for EOF, but not sure whether that woule be right here.
Do we need to check the length of buffer as well?

Rajkaumar has changed the tap test case patch as per revised error messages. 
Please find attached patch stack incorporated the above comments.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Robert Haas
Date:
On Tue, Dec 24, 2019 at 5:42 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> Made the change in backup manifest as well in backup validatort patch. Thanks to Rushabh Lathia for the offline
discussionand help.
 
>
> To examine the first word of each line, I am using below check:
> if (strncmp(line, "File", 4) == 0)
> {
> ..
> }
> else if (strncmp(line, "Manifest-Checksum", 17) == 0)
> {
> ..
> }
> else
>     error
>
> strncmp might be not right here, but we can not put '\0' in between the line (to find out first word)
> before we recognize the line type.
> All the lines expect line last one (where we have manifest checksum) are feed to the checksum machinary to calculate
manifestchecksum.
 
> so update_checksum() should be called after recognizing the type, i.e: if it is a File type record. Do you see any
issueswith this?
 

I see the problem, but I don't think your solution is right, because
the first test would pass if the line said FiletMignon rather than
just File, which we certainly don't want. You've got to write the test
so that you're checking against the whole first word, not just some
prefix of it. There are several possible ways to accomplish that, but
this isn't one of them.

>> + pg_log_error("invalid record found in \"%s\"", manifest_path);
>>
>> Error message needs work.

Looks better now, but you have a messages that say "invalid checksums
type \"%s\" found in \"%s\"". This is wrong because checksums would
need to be singular in this context (checksum). Also, I think it could
be better phrased as "manifest file \"%s\" specifies unknown checksum
algorithm \"%s\" at line %d".

>> Your function names should be consistent with the surrounding style,
>> and with each other, as far as possible. Three different conventions
>> within the same patch and source file seems over the top.

This appears to be fixed.

>> Also keep in mind that you're not writing code in a vacuum. There's a
>> whole file of code here, and around that, a whole project.
>> scan_data_directory() is a good example of a function whose name is
>> clearly too generic. It's not a general-purpose function for scanning
>> the data directory; it's specifically a support function for verifying
>> a backup. Yet, the name gives no hint of this.

But this appears not to be fixed.

>> if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
>> "/backup_manifest") == 0)
>>     continue;
>
> Thanks for the suggestion. Corrected as per the above inputs.

You need a comment here, like "Ignore the possible presence of a
backup_manifest file and/or a pg_wal directory in the backup being
verified." and then maybe another sentence explaining why that's the
right thing to do.

+             * The forth parameter to VerifyFile() will pass the relative path
+             * of file to match exactly with the filename present in manifest.

I don't know what this comment is trying to tell me, which might be
something you want to try to fix. However, I'm pretty sure it's
supposed to say "fourth" not "forth".

>> and the result would be that everything inside that long if-block is
>> now at the top level of the function and indented one level less. And
>> I think if you look at this function you'll see a way that you can
>> save a *second* level of indentation for much of that code. Please
>> check the rest of the patch for similar cases, too.
>
> Make sense. corrected.

I don't agree. A large chunk of VerifyFile() is still subject to a
quite unnecessary level of indentation.

> I have added a check for EOF, but not sure whether that woule be right here.
> Do we need to check the length of buffer as well?

That's really, really not right. EOF is not a character that can
appear in the buffer. It's chosen on purpose to be a value that never
matches any actual character when both the character and the EOF value
are regarded as values of type 'int'. That guarantee doesn't apply
here though because you're dealing with values of type 'char'. So what
this code is doing is searching for an impossible value using
incorrect logic, which has very little to do with the actual need
here, which is to avoid running off the end of the buffer. To see what
the problem is, try creating a file with no terminating newline, like
this:

echo -n this file has no terminating newline >> some-file

I doubt it will be very hard to make this patch crash horribly. Even
if you can't, it seems pretty clear that the logic isn't right.

I don't really know what the \0 tests in NextLine() and NextWord()
think they're doing either. If there's a \0 in the buffer before you
add one, it was in the original input data, and pretending like that
marks a word or line boundary seems like a fairly arbitrary choice.

What I suggest is:

(1) Allocate one byte more than the file size for the buffer that's
going to hold the file, so that if you write a \0 just after the last
byte of the file, you don't overrun the allocated buffer.

(2) Compute char *endptr = buf + len.

(3) Pass endptr to NextLine and NextWord and write the loop condition
something like while (*buf != '\n' && buf < endptr).

Other notes:

- The error handling in ReadFileIntoBuffer() does not seem to consider
the case of a short read. If you look through the source tree, you can
find examples of how we normally handle that.

- Putting string_hash_sdbm() into encode.c seems like a surprising
choice. What does this have to do with encoding anything? And why is
it going into src/common at all if it's only intended for frontend
use?

- It seems like whether or not any problems were found while verifying
the manifest ought to affect the exit status of pg_basebackup. I'm not
exactly sure what exit codes ought to be used, but you could look for
similar precedents. Document this, too.

- As much as possible let's have errors in the manifest file report
the line number, and let's also try to make them more specific, e.g.
instead of "invalid manifest record found in \"%s\"", perhaps
"manifest file \"%s\" contains invalid keyword \"%s\" at line %d".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Tels
Date:
Moin,

sorry for the very late reply. There was a discussion about the specific 
format of the backup manifests, and maybe that was already discussed and 
I just overlooked it:

1) Why invent your own format, and not just use a machine-readable 
format that already exists? It doesn't have to be full blown XML, or 
even JSON, something simple as YAML would already be better. That way 
not everyone has to write their own parser. Or maybe it is already YAML 
and just the different keywords where under discussion?

2) It would be very wise to add a version number to the format. That 
will making an extension later much easier and avoids the "we need  to 
add X, but that breaks compatibility with all software out there" 
situations that often arise a few years down the line.

Best regards,

and a happy New Year 2020

Tels



Re: backup manifests

From
David Fetter
Date:
On Tue, Dec 31, 2019 at 01:30:01PM +0100, Tels wrote:
> Moin,
> 
> sorry for the very late reply. There was a discussion about the specific
> format of the backup manifests, and maybe that was already discussed and I
> just overlooked it:
> 
> 1) Why invent your own format, and not just use a machine-readable format
> that already exists? It doesn't have to be full blown XML, or even JSON,
> something simple as YAML would already be better. That way not everyone has
> to write their own parser. Or maybe it is already YAML and just the
> different keywords where under discussion?

YAML is extremely fragile and error-prone. It's also a superset of
JSON, so I don't understand what you mean by "as simple as."

-1 from me on YAML

That said, I agree that there's no reason to come up with a bespoke
format and parser when JSON is already available in every PostgreSQL
installation.  Imposing a structure atop that includes a version
number, as you suggest, seems pretty straightforward, and should be
done.

Would it make sense to include some kind of capability description in
the format along with the version number?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: backup manifests

From
David Steele
Date:
On 12/31/19 10:43 AM, David Fetter wrote:
> On Tue, Dec 31, 2019 at 01:30:01PM +0100, Tels wrote:
>> Moin,
>>
>> sorry for the very late reply. There was a discussion about the specific
>> format of the backup manifests, and maybe that was already discussed and I
>> just overlooked it:
>>
>> 1) Why invent your own format, and not just use a machine-readable format
>> that already exists? It doesn't have to be full blown XML, or even JSON,
>> something simple as YAML would already be better. That way not everyone has
>> to write their own parser. Or maybe it is already YAML and just the
>> different keywords where under discussion?
> 
> YAML is extremely fragile and error-prone. It's also a superset of
> JSON, so I don't understand what you mean by "as simple as."
> 
> -1 from me on YAML

-1 from me as well.  YAML is easy to write but definitely non-trivial to 
read.

> That said, I agree that there's no reason to come up with a bespoke
> format and parser when JSON is already available in every PostgreSQL
> installation.  Imposing a structure atop that includes a version
> number, as you suggest, seems pretty straightforward, and should be
> done.

+1.  I continue to support a format that would be easily readable 
without writing a lot of code.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Robert Haas
Date:
On Tue, Dec 31, 2019 at 9:16 PM David Steele <david@pgmasters.net> wrote:
> > That said, I agree that there's no reason to come up with a bespoke
> > format and parser when JSON is already available in every PostgreSQL
> > installation.  Imposing a structure atop that includes a version
> > number, as you suggest, seems pretty straightforward, and should be
> > done.
>
> +1.  I continue to support a format that would be easily readable
> without writing a lot of code.

So, if someone can suggest to me how I could read JSON from a tool in
src/bin without writing a lot of code, I'm all ears. So far that's
been asserted but not been demonstrated to be possible. Getting the
JSON parser that we have in the backend to work from frontend doesn't
look all that straightforward, for reasons that I talked about in
http://postgr.es/m/CA+TgmobZrNYR-ATtfZiZ_k-W7tSPgvmYZmyiqumQig4R4fkzHw@mail.gmail.com

As to the suggestion that a version number be included, that's been
there in every version of the patch I've posted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
David Fetter
Date:
On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
> On Tue, Dec 31, 2019 at 9:16 PM David Steele <david@pgmasters.net> wrote:
> > > That said, I agree that there's no reason to come up with a bespoke
> > > format and parser when JSON is already available in every PostgreSQL
> > > installation.  Imposing a structure atop that includes a version
> > > number, as you suggest, seems pretty straightforward, and should be
> > > done.
> >
> > +1.  I continue to support a format that would be easily readable
> > without writing a lot of code.
> 
> So, if someone can suggest to me how I could read JSON from a tool in
> src/bin without writing a lot of code, I'm all ears. So far that's
> been asserted but not been demonstrated to be possible. Getting the
> JSON parser that we have in the backend to work from frontend doesn't
> look all that straightforward, for reasons that I talked about in
> http://postgr.es/m/CA+TgmobZrNYR-ATtfZiZ_k-W7tSPgvmYZmyiqumQig4R4fkzHw@mail.gmail.com

Maybe I'm missing something obvious, but wouldn't combining
pg_read_file() with a cast to JSONB fix this, as below?

shackle@[local]:5413/postgres(13devel)(892328) # SELECT jsonb_pretty(j::jsonb) FROM
pg_read_file('/home/shackle/advanced_comparison.json')AS t(j);
 
            jsonb_pretty            
════════════════════════════════════
 [                                 ↵
     {                             ↵
         "message": "hello world!",↵
         "severity": "[DEBUG]"     ↵
     },                            ↵
     {                             ↵
         "message": "boz",         ↵
         "severity": "[INFO]"      ↵
     },                            ↵
     {                             ↵
         "message": "foo",         ↵
         "severity": "[DEBUG]"     ↵
     },                            ↵
     {                             ↵
         "message": "null",        ↵
         "severity": "null"        ↵
     }                             ↵
 ]
(1 row)

Time: 3.050 ms

> As to the suggestion that a version number be included, that's been
> there in every version of the patch I've posted.

and thanks for that!

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: backup manifests

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
>> So, if someone can suggest to me how I could read JSON from a tool in
>> src/bin without writing a lot of code, I'm all ears.

> Maybe I'm missing something obvious, but wouldn't combining
> pg_read_file() with a cast to JSONB fix this, as below?

Only if you're prepared to restrict the use of the tool to superusers
(or at least people with whatever privilege that function requires).

Admittedly, you can probably feed the data to the backend without
use of an intermediate file; but it still requires a working backend
connection, which might be a bit of a leap for backup-related tools.
I'm sure Robert was envisioning doing this processing inside the tool.

            regards, tom lane



Re: backup manifests

From
Robert Haas
Date:
On Wed, Jan 1, 2020 at 7:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Fetter <david@fetter.org> writes:
> > On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
> >> So, if someone can suggest to me how I could read JSON from a tool in
> >> src/bin without writing a lot of code, I'm all ears.
>
> > Maybe I'm missing something obvious, but wouldn't combining
> > pg_read_file() with a cast to JSONB fix this, as below?
>
> Only if you're prepared to restrict the use of the tool to superusers
> (or at least people with whatever privilege that function requires).
>
> Admittedly, you can probably feed the data to the backend without
> use of an intermediate file; but it still requires a working backend
> connection, which might be a bit of a leap for backup-related tools.
> I'm sure Robert was envisioning doing this processing inside the tool.

Yeah, exactly. I don't think verifying a backup should require a
running server, let alone a running server on the same machine where
the backup is stored and for which you have superuser privileges.
AFAICS, the only options to make that work with JSON are (1) introduce
a new hand-coded JSON parser designed for frontend operation, (2) add
a dependency on an external JSON parser that we can use from frontend
code, or (3) adapt the existing JSON parser used in the backend so
that it can also be used in the frontend.

I'd be willing to do (1) -- it wouldn't be the first time I've written
JSON parser for PostgreSQL -- but I think it will take an order of
magnitude more code than using a file with tab-separated columns as
I've proposed, and I assume that there will be complaints about having
two JSON parsers in core. I'd also be willing to do (2) if that's the
consensus, but I'd vote against such an approach if somebody else
proposed it because (a) I'm not aware of a widely-available library
upon which we could depend and (b) introducing such a dependency for a
minor feature like this seems fairly unpalatable to me, and it'd
probably still be more code than just using a tab-separated file.  I'd
be willing to do (3) if somebody could explain to me how to solve the
problems with porting that code to work on the frontend side, but the
only suggestion so far as to how to do that is to port memory
contexts, elog/report, and presumably encoding handling to work on the
frontend side. That seems to me to be an unreasonably large lift,
especially given that we have lots of other files that use ad-hoc
formats already, and if somebody ever gets around to converting all of
those to JSON, they can certainly convert this one at the same time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> AFAICS, the only options to make that work with JSON are (1) introduce
> a new hand-coded JSON parser designed for frontend operation, (2) add
> a dependency on an external JSON parser that we can use from frontend
> code, or (3) adapt the existing JSON parser used in the backend so
> that it can also be used in the frontend.
> ...  I'd
> be willing to do (3) if somebody could explain to me how to solve the
> problems with porting that code to work on the frontend side, but the
> only suggestion so far as to how to do that is to port memory
> contexts, elog/report, and presumably encoding handling to work on the
> frontend side. That seems to me to be an unreasonably large lift,

Yeah, agreed.  The only consideration that'd make that a remotely
sane idea is that if somebody did the work, there would be other
uses for it.  (One that comes to mind immediately is cleaning up
ecpg's miserably-maintained fork of the backend datetime code.)

But there's no denying that it would be a large amount of work
(if it's even feasible), and nobody has stepped up to volunteer.
It's not reasonable to hold up this particular feature waiting
for that to happen.

If a tab-delimited file can handle this requirement, that seems
like a sane choice to me.

            regards, tom lane



Re: backup manifests

From
David Fetter
Date:
On Wed, Jan 01, 2020 at 08:57:11PM -0500, Robert Haas wrote:
> On Wed, Jan 1, 2020 at 7:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > David Fetter <david@fetter.org> writes:
> > > On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
> > >> So, if someone can suggest to me how I could read JSON from a tool in
> > >> src/bin without writing a lot of code, I'm all ears.
> >
> > > Maybe I'm missing something obvious, but wouldn't combining
> > > pg_read_file() with a cast to JSONB fix this, as below?
> >
> > Only if you're prepared to restrict the use of the tool to superusers
> > (or at least people with whatever privilege that function requires).
> >
> > Admittedly, you can probably feed the data to the backend without
> > use of an intermediate file; but it still requires a working backend
> > connection, which might be a bit of a leap for backup-related tools.
> > I'm sure Robert was envisioning doing this processing inside the tool.
> 
> Yeah, exactly. I don't think verifying a backup should require a
> running server, let alone a running server on the same machine where
> the backup is stored and for which you have superuser privileges.

Thanks for clarifying the context.

> AFAICS, the only options to make that work with JSON are (1) introduce
> a new hand-coded JSON parser designed for frontend operation, (2) add
> a dependency on an external JSON parser that we can use from frontend
> code, or (3) adapt the existing JSON parser used in the backend so
> that it can also be used in the frontend.
> 
> I'd be willing to do (1) -- it wouldn't be the first time I've written
> JSON parser for PostgreSQL -- but I think it will take an order of
> magnitude more code than using a file with tab-separated columns as
> I've proposed, and I assume that there will be complaints about having
> two JSON parsers in core. I'd also be willing to do (2) if that's the
> consensus, but I'd vote against such an approach if somebody else
> proposed it because (a) I'm not aware of a widely-available library
> upon which we could depend and

I believe jq has an excellent one that's available under a suitable
license.

Making jq a dependency seems like a separate discussion, though. At
the moment, we don't use git tools like submodel/subtree, and deciding
which (or whether) seems like a gigantic discussion all on its own.

> (b) introducing such a dependency for a minor feature like this
> seems fairly unpalatable to me, and it'd probably still be more code
> than just using a tab-separated file.  I'd be willing to do (3) if
> somebody could explain to me how to solve the problems with porting
> that code to work on the frontend side, but the only suggestion so
> far as to how to do that is to port memory contexts, elog/report,
> and presumably encoding handling to work on the frontend side.

This port has come up several times recently in different contexts.
How big a chunk of work would it be?  Just so we're clear, I'm not
suggesting that this port should gate this feature.

> That seems to me to be an unreasonably large lift, especially given
> that we have lots of other files that use ad-hoc formats already,
> and if somebody ever gets around to converting all of those to JSON,
> they can certainly convert this one at the same time.

Would that require some kind of file converter program, or just a
really loud notice in the release notes?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: backup manifests

From
Robert Haas
Date:
On Thu, Jan 2, 2020 at 1:03 PM David Fetter <david@fetter.org> wrote:
> I believe jq has an excellent one that's available under a suitable
> license.
>
> Making jq a dependency seems like a separate discussion, though. At
> the moment, we don't use git tools like submodel/subtree, and deciding
> which (or whether) seems like a gigantic discussion all on its own.

Yep. And it doesn't seem worth it for a relatively small feature like
this. If we already had it, it might be worth using for a relatively
small feature like this, but that's a different issue.

> > (b) introducing such a dependency for a minor feature like this
> > seems fairly unpalatable to me, and it'd probably still be more code
> > than just using a tab-separated file.  I'd be willing to do (3) if
> > somebody could explain to me how to solve the problems with porting
> > that code to work on the frontend side, but the only suggestion so
> > far as to how to do that is to port memory contexts, elog/report,
> > and presumably encoding handling to work on the frontend side.
>
> This port has come up several times recently in different contexts.
> How big a chunk of work would it be?  Just so we're clear, I'm not
> suggesting that this port should gate this feature.

I don't really know. It's more of a research project than a coding
project, at least initially, I think. For instance, psql has its own
non-local-transfer-of-control mechanism using sigsetjmp(). If you
wanted to introduce elog/ereport on the frontend, would you make psql
use it? Or just let psql continue to do what it does now and introduce
the new mechanism as an option for code going forward? Or try to make
the two mechanisms work together somehow? Will you start using the
same error codes that we use in the backend on the frontend side, and
if so, what will they do, given that what the backend does is just
embed them in a protocol message that any particular client may or may
not display? Similarly, should frontend errors support reporting a
hint, detail, statement, or query? Will it be confusing if backend and
frontend errors are too similar? If you make memory contexts available
in the frontend, what if any code will you adapt to use them? There's
a lot of stuff in src/bin. If you want the encoding machinery on the
front end, what will you use in place of the backend's idea of the
"database encoding"? What will you do about dependencies on Datum in
frontend code? Somebody would need to study all this stuff, come up
with a tentative set of decisions, write patches, get it all working,
and then quite possibly have the choices they made get second-guessed
by other people who have different ideas. If you come up with a really
good, clean proposal that doesn't provoke any major disagreements, you
might be able to get this done in a couple of months. If you can't
come up with something people good, or if you're the only one who
thinks what you come up with is good, it might take years.

It seems to me that in a perfect world a lot of the code we have in
the backend that is usefully reusable in other contexts would be
structured so that it doesn't have random dependencies on backend-only
machinery like memory contexts and elog/ereport. For example, if you
write a function that returns an error message rather than throwing an
error, then you can arrange to call that from either frontend or
backend code and the caller can do whatever it wishes with that error
text. However, once you've written your code so that an error gets
thrown six layers down in the call stack, it's really hard to
rearrange that so that the error is returned, and if you are
populating not only the primary error message but error code, detail,
hint, etc. it's almost impractical to think that you can rearrange
things that way anyway. And generally you want to be populating those
things, as a best practice for backend code. So while in theory I kind
of like the idea of adapting the JSON parser we've already got to just
not depend so heavily on a backend environment, it's not really very
clear how to actually make that happen. At least not to me.

> > That seems to me to be an unreasonably large lift, especially given
> > that we have lots of other files that use ad-hoc formats already,
> > and if somebody ever gets around to converting all of those to JSON,
> > they can certainly convert this one at the same time.
>
> Would that require some kind of file converter program, or just a
> really loud notice in the release notes?

Maybe neither. I don't see why it wouldn't be possible to be
backward-compatible just by keeping the old code around and having it
parse as far as the version number. Then it could decide to continue
on with the old code or call the new code, depending.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Suraj Kharage
Date:
Thank you for review comments.

On Mon, Dec 30, 2019 at 11:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 24, 2019 at 5:42 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> To examine the first word of each line, I am using below check:
> if (strncmp(line, "File", 4) == 0)
> {
> ..
> }
> else if (strncmp(line, "Manifest-Checksum", 17) == 0)
> {
> ..
> }
> else
>     error
>
> strncmp might be not right here, but we can not put '\0' in between the line (to find out first word)
> before we recognize the line type.
> All the lines expect line last one (where we have manifest checksum) are feed to the checksum machinary to calculate manifest checksum.
> so update_checksum() should be called after recognizing the type, i.e: if it is a File type record. Do you see any issues with this?

I see the problem, but I don't think your solution is right, because
the first test would pass if the line said FiletMignon rather than
just File, which we certainly don't want. You've got to write the test
so that you're checking against the whole first word, not just some
prefix of it. There are several possible ways to accomplish that, but
this isn't one of them.
 
Yeah. Fixed in the attached patch.
 

>> + pg_log_error("invalid record found in \"%s\"", manifest_path);
>>
>> Error message needs work.

Looks better now, but you have a messages that say "invalid checksums
type \"%s\" found in \"%s\"". This is wrong because checksums would
need to be singular in this context (checksum). Also, I think it could
be better phrased as "manifest file \"%s\" specifies unknown checksum
algorithm \"%s\" at line %d".
 
Corrected.
 

>> Your function names should be consistent with the surrounding style,
>> and with each other, as far as possible. Three different conventions
>> within the same patch and source file seems over the top.

This appears to be fixed.

>> Also keep in mind that you're not writing code in a vacuum. There's a
>> whole file of code here, and around that, a whole project.
>> scan_data_directory() is a good example of a function whose name is
>> clearly too generic. It's not a general-purpose function for scanning
>> the data directory; it's specifically a support function for verifying
>> a backup. Yet, the name gives no hint of this.

But this appears not to be fixed.

I have changed this function name to "VerifyDir" likewise, we have sendDir and sendFile in basebackup.c
 

>> if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
>> "/backup_manifest") == 0)
>>     continue;
>
> Thanks for the suggestion. Corrected as per the above inputs.

You need a comment here, like "Ignore the possible presence of a
backup_manifest file and/or a pg_wal directory in the backup being
verified." and then maybe another sentence explaining why that's the
right thing to do.
 
Corrected.
 

+             * The forth parameter to VerifyFile() will pass the relative path
+             * of file to match exactly with the filename present in manifest.

I don't know what this comment is trying to tell me, which might be
something you want to try to fix. However, I'm pretty sure it's
supposed to say "fourth" not "forth".

I have changed the fourth parameter of VerifyFile(), so my comment over there is no more valid.
 

>> and the result would be that everything inside that long if-block is
>> now at the top level of the function and indented one level less. And
>> I think if you look at this function you'll see a way that you can
>> save a *second* level of indentation for much of that code. Please
>> check the rest of the patch for similar cases, too.
>
> Make sense. corrected.

I don't agree. A large chunk of VerifyFile() is still subject to a
quite unnecessary level of indentation.
 
Yeah, corrected.
 

> I have added a check for EOF, but not sure whether that woule be right here.
> Do we need to check the length of buffer as well?

That's really, really not right. EOF is not a character that can
appear in the buffer. It's chosen on purpose to be a value that never
matches any actual character when both the character and the EOF value
are regarded as values of type 'int'. That guarantee doesn't apply
here though because you're dealing with values of type 'char'. So what
this code is doing is searching for an impossible value using
incorrect logic, which has very little to do with the actual need
here, which is to avoid running off the end of the buffer. To see what
the problem is, try creating a file with no terminating newline, like
this:

echo -n this file has no terminating newline >> some-file

I doubt it will be very hard to make this patch crash horribly. Even
if you can't, it seems pretty clear that the logic isn't right.

I don't really know what the \0 tests in NextLine() and NextWord()
think they're doing either. If there's a \0 in the buffer before you
add one, it was in the original input data, and pretending like that
marks a word or line boundary seems like a fairly arbitrary choice.

What I suggest is:

(1) Allocate one byte more than the file size for the buffer that's
going to hold the file, so that if you write a \0 just after the last
byte of the file, you don't overrun the allocated buffer.

(2) Compute char *endptr = buf + len.

(3) Pass endptr to NextLine and NextWord and write the loop condition
something like while (*buf != '\n' && buf < endptr).
 
Thanks for the suggestion. Corrected as per above suggestion.
 

Other notes:

- The error handling in ReadFileIntoBuffer() does not seem to consider
the case of a short read. If you look through the source tree, you can
find examples of how we normally handle that.
 
yeah, corrected.
 

- Putting string_hash_sdbm() into encode.c seems like a surprising
choice. What does this have to do with encoding anything? And why is
it going into src/common at all if it's only intended for frontend
use?
I thought this function can be used in backend as well,  i.e: likewise we are using in simplehash,  so kept that in src/common.
After your comment, I have moved this to pg_basebackup.c. 
I think this can be kept in common place but not in "srs/common/encode.c" thoughts?
 

- It seems like whether or not any problems were found while verifying
the manifest ought to affect the exit status of pg_basebackup. I'm not
exactly sure what exit codes ought to be used, but you could look for
similar precedents. Document this, too.
I might be not getting this completely correct, but as per my observation, if any error occurs, pg_basebackup terminated with exit(1).
Whereas in normal case (without an error), main function returns 0. The "help" and "version" option terminate normally with exit(0).
So in our case, exit(0) would be appropriate. Please correct me if I misunderstood anything.
 

- As much as possible let's have errors in the manifest file report
the line number, and let's also try to make them more specific, e.g.
instead of "invalid manifest record found in \"%s\"", perhaps
"manifest file \"%s\" contains invalid keyword \"%s\" at line %d".
yeah, added line number at possible places.

I have also fixed few comments given by Jeevan Chalke offlist.

Please find attached v7 patches and let me know your comments.
--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > AFAICS, the only options to make that work with JSON are (1) introduce
> > a new hand-coded JSON parser designed for frontend operation, (2) add
> > a dependency on an external JSON parser that we can use from frontend
> > code, or (3) adapt the existing JSON parser used in the backend so
> > that it can also be used in the frontend.
> > ...  I'd
> > be willing to do (3) if somebody could explain to me how to solve the
> > problems with porting that code to work on the frontend side, but the
> > only suggestion so far as to how to do that is to port memory
> > contexts, elog/report, and presumably encoding handling to work on the
> > frontend side. That seems to me to be an unreasonably large lift,
>
> Yeah, agreed.  The only consideration that'd make that a remotely
> sane idea is that if somebody did the work, there would be other
> uses for it.  (One that comes to mind immediately is cleaning up
> ecpg's miserably-maintained fork of the backend datetime code.)
>
> But there's no denying that it would be a large amount of work
> (if it's even feasible), and nobody has stepped up to volunteer.
> It's not reasonable to hold up this particular feature waiting
> for that to happen.

Sure, it'd be work, and for "adding a simple backup manifest", maybe too
much to be worth considering ... but that's not what is going on here,
is it?  Are we really *just* going to add a backup manifest to
pg_basebackup and call it done?  That's not what I understood the goal
here to be but rather to start doing a lot of other things with
pg_basebackup beyond just having a manifest and if you think just a bit
farther down the path, I think you start to realize that you're going to
need this base set of capabilities to get to a point where pg_basebackup
(or whatever it ends up being called) is able to have the kind of
capabilities that exist in other PG backup software already.

I'm sure I don't need to say where to find it, but I can point you to a
pretty good example of a similar effort, and we didn't start with "build
a manifest into a custom format" as the first thing implemented, but
rather a great deal of work was first put into building out things like
logging, memory management/contexts, error handling/try-catch, having a
string type, a variant type, etc.

In some ways, it's kind of impressive what we've got in our front-ends
tools even though we don't have these things, really, and certainly not
all in one nice library that they all use...  but at the same time, I
think that lack has also held those tools back, pg_basebackup among
them.

Anyway, off my high horse, I'll just say I agree w/ David and David wrt
using JSON for this over hacking together yet another format.  We didn't
do that as thoroughly as we should have (we've got a JSON parser and all
that, and use JSON quite a bit, but the actual manifest format is a mix
of ini-style and JSON, because it's got more in it than just a list of
files, something that I suspect will also end up being true of this down
the road and for good reasons, and we started with the ini format and
discovered it sucked and then started embedding JSON in it...), and
we've come to realize that was a bad idea, and intend to fix it in our
next manifest major version bump.  Would be unfortunate to see PG making
that same mistake.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Fri, Jan 3, 2020 at 11:44 AM Stephen Frost <sfrost@snowman.net> wrote:
> Sure, it'd be work, and for "adding a simple backup manifest", maybe too
> much to be worth considering ... but that's not what is going on here,
> is it?  Are we really *just* going to add a backup manifest to
> pg_basebackup and call it done?  That's not what I understood the goal
> here to be but rather to start doing a lot of other things with
> pg_basebackup beyond just having a manifest and if you think just a bit
> farther down the path, I think you start to realize that you're going to
> need this base set of capabilities to get to a point where pg_basebackup
> (or whatever it ends up being called) is able to have the kind of
> capabilities that exist in other PG backup software already.

I have no development plans for pg_basebackup that require extending
the format of the manifest file in any significant way, and am not
aware that anyone else has such plans either. If you are aware of
something I'm not, or if anyone else is, it would be helpful to know
about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Jan 3, 2020 at 11:44 AM Stephen Frost <sfrost@snowman.net> wrote:
> > Sure, it'd be work, and for "adding a simple backup manifest", maybe too
> > much to be worth considering ... but that's not what is going on here,
> > is it?  Are we really *just* going to add a backup manifest to
> > pg_basebackup and call it done?  That's not what I understood the goal
> > here to be but rather to start doing a lot of other things with
> > pg_basebackup beyond just having a manifest and if you think just a bit
> > farther down the path, I think you start to realize that you're going to
> > need this base set of capabilities to get to a point where pg_basebackup
> > (or whatever it ends up being called) is able to have the kind of
> > capabilities that exist in other PG backup software already.
>
> I have no development plans for pg_basebackup that require extending
> the format of the manifest file in any significant way, and am not
> aware that anyone else has such plans either. If you are aware of
> something I'm not, or if anyone else is, it would be helpful to know
> about it.

You're certainly intending to do *something* with the manifest, and
while I appreciate that you feel you've come up with a complete use-case
that this simple manifest will be sufficient for, I frankly doubt
that'll actually be the case.  Not long ago it wasn't completely clear
that a manifest at *all* was even going to be necessary for the specific
use-case you had in mind (I'll admit I wasn't 100% sure myself at the
time either), but now that we're down the road of having one, I can't
agree with the blanket assumption that we're never going to want to
extend it, or even that it won't be necessary to add to it before this
particular use-case is fully addressed.

And the same goes for the other things that were discussed up-thread
regarding memory context and error handling and such.

I'm happy to outline the other things that one *might* want to include
in a manifest, if that would be helpful, but I'll also say that I'm not
planning to hack on adding that to pg_basebackup in the next month or
two.  Once we've actually got a manifest, if it's in an extendable
format, I could certainly see people wanting to do more with it though.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Fri, Jan 3, 2020 at 12:01 PM Stephen Frost <sfrost@snowman.net> wrote:
> You're certainly intending to do *something* with the manifest, and
> while I appreciate that you feel you've come up with a complete use-case
> that this simple manifest will be sufficient for, I frankly doubt
> that'll actually be the case.  Not long ago it wasn't completely clear
> that a manifest at *all* was even going to be necessary for the specific
> use-case you had in mind (I'll admit I wasn't 100% sure myself at the
> time either), but now that we're down the road of having one, I can't
> agree with the blanket assumption that we're never going to want to
> extend it, or even that it won't be necessary to add to it before this
> particular use-case is fully addressed.
>
> And the same goes for the other things that were discussed up-thread
> regarding memory context and error handling and such.

Well, I don't know how to make you happy here. It looks to me like
insisting on a JSON-format manifest will likely mean that this doesn't
get into PG13 or PG14 or probably PG15, because a port of all that
machinery to work in frontend code will be neither simple nor quick.
If you want this to happen for this release, you've got to be willing
to settle for something that can be implemented in the time we have.

I'm not sure whether what you and David are arguing boils down to
thinking that I'm wrong when I say that doing that is hard, or whether
you know it's hard but you just don't care because you'd rather see
the feature go nowhere than use a format other than JSON. I don't see
much difference between the latter position and a desire to block the
feature permanently. And if it's the former then you have yet to make
any suggestions for how to get it done with reasonable effort.

> I'm happy to outline the other things that one *might* want to include
> in a manifest, if that would be helpful, but I'll also say that I'm not
> planning to hack on adding that to pg_basebackup in the next month or
> two.  Once we've actually got a manifest, if it's in an extendable
> format, I could certainly see people wanting to do more with it though.

Well, as I say, it's got a version number, so somebody can always come
along with something better. I really think this is a red herring,
though. If somebody wants to track additional data about a backup,
there's no rule that they have to include it in the backup manifest. A
backup management solution might want to track things like who
initiated the backup, or for what purpose it was taken, or the IP
address of the machine where it was taken, or the backup system's own
identifier, but any of that stuff could (and probably should) be
stored in a file managed by that tool rather than in the server's own
manifest.  As to the per-file information, I believe that David and I
discussed that and the list of fields that I had seemed relatively OK,
and I believe I added at least one (mtime) per his suggestion. Of
course, it's a tab-separated file; more fields could easily be added
at the end, separated by tabs. Or, you could modify the file so that
after each "File" line you had another line with supplementary
information about that file, beginning with some other word. Or, you
could convert the whole file to JSON for v2 of the manifest, if,
contrary to my belief, that's a fairly simple thing to do. There are
probably other approaches as well. This file format has already had
considerably more thought about forward-compatibility than
pg_hba.conf, which has been retrofitted multiple times without
breaking the world.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Jan 3, 2020 at 12:01 PM Stephen Frost <sfrost@snowman.net> wrote:
> > You're certainly intending to do *something* with the manifest, and
> > while I appreciate that you feel you've come up with a complete use-case
> > that this simple manifest will be sufficient for, I frankly doubt
> > that'll actually be the case.  Not long ago it wasn't completely clear
> > that a manifest at *all* was even going to be necessary for the specific
> > use-case you had in mind (I'll admit I wasn't 100% sure myself at the
> > time either), but now that we're down the road of having one, I can't
> > agree with the blanket assumption that we're never going to want to
> > extend it, or even that it won't be necessary to add to it before this
> > particular use-case is fully addressed.
> >
> > And the same goes for the other things that were discussed up-thread
> > regarding memory context and error handling and such.
>
> Well, I don't know how to make you happy here.

I suppose I should admit that, first off, I don't feel you're required
to make me happy, and I don't think it's necessary to make me happy to
get this feature into PG.

Since you expressed that interest though, I'll go out on a limb and say
that what would make me *really* happy would be to think about where the
project should be taking pg_basebackup, what we should be working on
*today* to address the concerns we hear about from our users, and to
consider the best way to implement solutions to what they're actively
asking for a core backup solution to be providing.  I get that maybe
that isn't how the world works and that sometimes we have people who
write our paychecks wanting us to work on something else, and yes, I'm
sure there are some users who are asking for this specific thing but I
certainly don't think it's a common ask of pg_basebackup or what users
feel is missing from the backup options we offer in core; we had users
on this list specifically saying they *wouldn't* use this feature
(referring to the differential backup stuff, of course), in fact,
because of the things which are missing, which is pretty darn rare.

That's what would make *me* happy.  Even some comments about how to
*get* there while also working towards these features would be likely
to make me happy.  Instead, I feel like we're being told that we need
this feature badly in v13 and we're going to cut bait and do whatever
is necessary to get us there.

> It looks to me like
> insisting on a JSON-format manifest will likely mean that this doesn't
> get into PG13 or PG14 or probably PG15, because a port of all that
> machinery to work in frontend code will be neither simple nor quick.

I certainly understand that these things take time, sometimes quite a
bit of it as the past 2 years have shown in this other little side
project, and that was hacking without having to go through the much
larger effort involved in getting things into PG core.  That doesn't
mean that kind of effort isn't worthwhile or that, because something is
a bunch of work, we shouldn't spend the time on it.  I do feel what
you're after here is a multi-year project, and I've said before that I
don't agree that this is a feature (the differential backup with
pg_basebackup thing) that makes any sense going into PG at this time,
but I'm also not trying to block this feature, just to share the
experience that we've gotten from working in this area for quite a
while and hopefully help guide the effort in PG away from pitfalls and
in a good direction long-term.

> If you want this to happen for this release, you've got to be willing
> to settle for something that can be implemented in the time we have.

I'm not sure what you're expecting here, but for my part, at least, I'm
not going to be terribly upset if this feature doesn't make this release
because there's an agreement and understanding that the current
direction isn't a good long-term solution.  Nor am I going to be
terribly upset about the time that's been spent on this particular
approach given that there's been no shortage of people commenting that
they'd rather see an extensible format, like JSON, and has been for
quite some time.

All that said- one thing we've done is to consider that *we* are the
ones who are writing the JSON, while also being the ones to read it- we
don't need the parsing side to understand and deal with *any* JSON that
might exist out there, just whatever it is the server creates/created.
It may be possible to use that to simplify the parser, or perhaps at
least to accept that if it ends up being given something else that it
might not perform as well with it.  I'm not sure how helpful that will
be to you, but I recall David finding it a helpful thought.

> I'm not sure whether what you and David are arguing boils down to
> thinking that I'm wrong when I say that doing that is hard, or whether
> you know it's hard but you just don't care because you'd rather see
> the feature go nowhere than use a format other than JSON. I don't see
> much difference between the latter position and a desire to block the
> feature permanently. And if it's the former then you have yet to make
> any suggestions for how to get it done with reasonable effort.

There seems to be a great deal of daylight between the two positions
you're proposing I might have (as I don't speak for David..).

I *do* think there's a lot of work that would need to be done here to
make this a good solution.  I'm *not* completely against other formats
besides JSON.  Even more so though, I am *not* argueing that this
feature should go 'nowhere', whether it uses JSON or not.

What I don't care for is having a hand-hacked inflexible format that's
going to require everyone down the road to implement their own parser
for it and bespoke code for every version of the custom format that
there ends up being, *including* PG core, to be clear.  Whatever utility
is going to be utilizing this manifest, it's going to need to support
older versions, just like pg_dump deals with older versions of custom
format dumps (though we still get people complaining about not being
able to use older tools with newer dumps- it'd be awful nice if we
could use JSON, or something, and then just *add* things that wouldn't
break older tools, except for the rare case where we don't have a
choice..).  Not to mention the debugging grief and such, since we can't
just use a tool like jq to check out what's going on.

As to the reference to pg_hba.conf- I don't think the packagers would
necessairly agree that there's been little grief around that, but even
so, a given pg_hba.conf is only going to be used with a given major
version and, sure, it might have to be updated to that newer major
version's format if we change the format and someone copies the old
version to the new version, but that's during a major version upgrade of
the server, and at least newer tools don't have to deal with the older
pg_hba.conf version.

Also, pg_hba.conf doesn't seem like a terribly good example in any case-
the last time the actual structure of that file was changed in a
breaking way was in 2002 when the 'user' column was added, and the
example pg_hba.conf from that commit works just fine with PG12, it
seems, based on some quick tests.  There have been other
backwards-incompatible changes, of course, the last being 6 years ago, I
think, when 'krb5' was removed.  I suppose there is some chance that you
might have a PG12-configured pg_hba.conf and you try copying that back
to a PG11 or PG10 server and it doesn't work, but that strikes me as far
less of an issue than trying to read a PG12 backup with a PG11 tool,
which we know people do because they complain on the lists about it with
pg_dump/pg_restore.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Fri, Jan 3, 2020 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Well, I don't know how to make you happy here.
>
> I suppose I should admit that, first off, I don't feel you're required
> to make me happy, and I don't think it's necessary to make me happy to
> get this feature into PG.

Fair enough. That is gracious of you, but I would like to try to make
you happy if it is possible to do so.

> Since you expressed that interest though, I'll go out on a limb and say
> that what would make me *really* happy would be to think about where the
> project should be taking pg_basebackup, what we should be working on
> *today* to address the concerns we hear about from our users, and to
> consider the best way to implement solutions to what they're actively
> asking for a core backup solution to be providing.  I get that maybe
> that isn't how the world works and that sometimes we have people who
> write our paychecks wanting us to work on something else, and yes, I'm
> sure there are some users who are asking for this specific thing but I
> certainly don't think it's a common ask of pg_basebackup or what users
> feel is missing from the backup options we offer in core; we had users
> on this list specifically saying they *wouldn't* use this feature
> (referring to the differential backup stuff, of course), in fact,
> because of the things which are missing, which is pretty darn rare.

Well, I mean, what you seem to be suggesting here is that somebody is
driving me with a stick to do something that I don't really like but
have to do because otherwise I won't be able to make rent, but that's
actually not the case. I genuinely believe that this is a good design,
and it's driven by me, not some shadowy conglomerate of EnterpriseDB
executives who are out to make PostgreSQL sucks. If I'm wrong and the
design sucks, that's again not the fault of shadowy EnterpriseDB
executives; it's my fault. Incidentally, my boss is not very shadowy
anyhow; he's a super-nice guy, and a major reason why I work here. :-)

I don't think the issue here is that I haven't thought about what
users want, but that not everybody wants the same thing, and it's
seems like the people with whom I interact want somewhat different
things than those with whom you interact. EnterpriseDB has an existing
tool that does parallel and block-level incremental backup, and I
started out with the goal of providing those same capabilities in
core. They are quite popular with EnterpriseDB customers, and I'd like
to make them more widely available and, as far as I can, improve on
them. From our previous discussion and from a (brief) look at
pgbackrest, I gather that the interests of your customers are somewhat
different. Apparently, block-level incremental backup isn't quite as
important to your customers, perhaps because you've already got
file-level incremental backup, but various other things like
encryption and backup verification are extremely important, and you've
got a set of ideas about what would be valuable in the future which
I'm sure is based on real input from your customers. I hope you pursue
those ideas, and I hope you do it in core rather than in a separate
piece of software, but that's up to you. Meanwhile, I think that if I
have somewhat different ideas about what I'd like to pursue, that
ought to be just fine. And I don't think it is unreasonable to hope
that you'll acknowledge my goals as legitimate even if you have
different ones.

I want to point out that my idea about how to do all of this has
shifted by a considerable amount based on the input that you and David
have provided. My original design didn't involve a backup manifest,
but now it does. That turned out to be necessary, but it was also
something you suggested, and something where I asked and took advice
on what ought to go into it. Likewise, you suggested that the process
of taking the backup should involve giving the client more control
rather than trying to do everything on the server side, and that is
now the design which I plan to pursue. You suggested that because it
would be more advantageous for out-of-core backup tools, such as
pgbackrest, and I acknowledge that as a benefit and I think we're
headed in that direction. I am not doing a single thing which, to my
knowledge, blocks anything that you might want to do with
pg_basebackup in the future. I have accepted as much of your input as
I believe that I can without killing the project off completely. To go
further, I'd have to either accept years of delay or abandon my
priorities entirely and pursue yours.

> That's what would make *me* happy.  Even some comments about how to
> *get* there while also working towards these features would be likely
> to make me happy.  Instead, I feel like we're being told that we need
> this feature badly in v13 and we're going to cut bait and do whatever
> is necessary to get us there.

This seems like a really unfair accusation given how much work I've
put into trying to satisfy you and David. If this patch, the parallel
full backup patch, and the incremental backup patch were all to get
committed to v13, an outcome which seems pretty unlikely to me at this
point, then you would have a very significant number of things that
you have requested in the course of the various discussions, and
AFAICS the only thing you'd have that you don't want is the need to
parse the manifest file use while (<>) { @a = split /\t/, $_ } rather
than $a = parse_json(join '', <>). You would, for example, have the
ability to request an individual file from the server rather than a
complete tarball. Maybe the command that requests a file would lack an
encryption option, something which IIUC you would like to have, but
that certainly does not leave you worse off. It is easier to add an
encryption option to a command which you already have than it is to
invent a whole new command -- or really several whole new commands,
since such a command is not really usable unless you also have
facilities to start and stop a backup through the replication
protocol.

All that being said, I continue to maintain that insisting on JSON is
not a reasonable request. It is not easy to parse JSON, or a subset of
JSON. The amount of code required to write even a stripped-down JSON
parser is far more than the amount required to split a file on tabs,
and the existing code we have for the backend cannot be easily (or
even with moderate effort) adapted to work in the frontend. On the
other hand, the code that pgbackrest would need to parse the manifest
file format I've proposed could have easily been written in less time
than you've spent arguing about it. Heck, if it helps, I'll offer
write that patch myself (I could be a pgbackrest contributor!). I
don't want this effort to suck because something gets rushed through
too quickly, but I also don't want it to get derailed because of what
I view as a relatively minor detail. It is not always right to take
the easier road, but it is also not always wrong. I have no illusions
that what is being proposed here is perfect, but lots of features
started out imperfect and get better over time -- RLS and parallel
query come to mind, among others -- and we often learn from the
experience of shipping something which parts of the feature are most
in need of improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Jan 3, 2020 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > Well, I don't know how to make you happy here.
> >
> > I suppose I should admit that, first off, I don't feel you're required
> > to make me happy, and I don't think it's necessary to make me happy to
> > get this feature into PG.
>
> Fair enough. That is gracious of you, but I would like to try to make
> you happy if it is possible to do so.

I certainly appreciate that, but I don't know that it is possible to do
so while approaching this in the order that you are, which I tried to
point out previously.

> > Since you expressed that interest though, I'll go out on a limb and say
> > that what would make me *really* happy would be to think about where the
> > project should be taking pg_basebackup, what we should be working on
> > *today* to address the concerns we hear about from our users, and to
> > consider the best way to implement solutions to what they're actively
> > asking for a core backup solution to be providing.  I get that maybe
> > that isn't how the world works and that sometimes we have people who
> > write our paychecks wanting us to work on something else, and yes, I'm
> > sure there are some users who are asking for this specific thing but I
> > certainly don't think it's a common ask of pg_basebackup or what users
> > feel is missing from the backup options we offer in core; we had users
> > on this list specifically saying they *wouldn't* use this feature
> > (referring to the differential backup stuff, of course), in fact,
> > because of the things which are missing, which is pretty darn rare.
>
> Well, I mean, what you seem to be suggesting here is that somebody is
> driving me with a stick to do something that I don't really like but
> have to do because otherwise I won't be able to make rent, but that's
> actually not the case. I genuinely believe that this is a good design,
> and it's driven by me, not some shadowy conglomerate of EnterpriseDB
> executives who are out to make PostgreSQL sucks. If I'm wrong and the
> design sucks, that's again not the fault of shadowy EnterpriseDB
> executives; it's my fault. Incidentally, my boss is not very shadowy
> anyhow; he's a super-nice guy, and a major reason why I work here. :-)

Then I just have to disagree, really vehemently, that having a
block-level incremental backup solution without solid dependency
handling between incremental and full backups, solid WAL management and
archiving, expiration handling for incremental/full backups and WAL, and
the manifest that that this thread has been about, is a good design.

Ultimately, what this calls for is some kind of 'repository' which
you've stressed you don't think is a good idea for pg_basebackup to ever
deal with and I just can't disagree more with that.  I could perhaps
agree that it isn't appropriate for the specific tool "pg_basebackup" to
work with a repo because of the goal of that particular tool, but in
that case, I don't think pg_basebackup should be the tool to provide a
block-level incremental backup solution, it should continue to be a tool
to provide a simple and easy way to take a one-time, complete, snapshot
of a running PG system over the replication protocol- and adding support
for parallel backups, or encrypted backups, or similar things would be
completely in-line and appropriate for such a tool, and I'm not against
those features being added to pg_basebackup even in advance of anything
like support for a repo or dependency handling.

> I don't think the issue here is that I haven't thought about what
> users want, but that not everybody wants the same thing, and it's
> seems like the people with whom I interact want somewhat different
> things than those with whom you interact. EnterpriseDB has an existing
> tool that does parallel and block-level incremental backup, and I
> started out with the goal of providing those same capabilities in
> core. They are quite popular with EnterpriseDB customers, and I'd like
> to make them more widely available and, as far as I can, improve on
> them. From our previous discussion and from a (brief) look at
> pgbackrest, I gather that the interests of your customers are somewhat
> different. Apparently, block-level incremental backup isn't quite as
> important to your customers, perhaps because you've already got
> file-level incremental backup, but various other things like
> encryption and backup verification are extremely important, and you've
> got a set of ideas about what would be valuable in the future which
> I'm sure is based on real input from your customers. I hope you pursue
> those ideas, and I hope you do it in core rather than in a separate
> piece of software, but that's up to you. Meanwhile, I think that if I
> have somewhat different ideas about what I'd like to pursue, that
> ought to be just fine. And I don't think it is unreasonable to hope
> that you'll acknowledge my goals as legitimate even if you have
> different ones.

I'm all for block-level incremental backup, in general (though I've got
concerns about it from a correctness standpoint..  I certainly think
it's going to be difficult to get right and probably finicky, but
hopefully your experience with BART has let you identify where the
dragons lie and it'll be interesting to see what that code looks like
and if the approach used can be leveraged in other tools), but I am
concerned about how we're getting there.

> I want to point out that my idea about how to do all of this has
> shifted by a considerable amount based on the input that you and David
> have provided. My original design didn't involve a backup manifest,
> but now it does. That turned out to be necessary, but it was also
> something you suggested, and something where I asked and took advice
> on what ought to go into it. Likewise, you suggested that the process
> of taking the backup should involve giving the client more control
> rather than trying to do everything on the server side, and that is
> now the design which I plan to pursue. You suggested that because it
> would be more advantageous for out-of-core backup tools, such as
> pgbackrest, and I acknowledge that as a benefit and I think we're
> headed in that direction. I am not doing a single thing which, to my
> knowledge, blocks anything that you might want to do with
> pg_basebackup in the future. I have accepted as much of your input as
> I believe that I can without killing the project off completely. To go
> further, I'd have to either accept years of delay or abandon my
> priorities entirely and pursue yours.

While I'm hopeful that the parallel backup pieces will be useful to
out-of-core backup tools, I've been increasingly less confident that
it'll end up being very useful to pgbackrest, as much as I would like it
to be.  Perhaps after it's in place we might be able to work on it to
make it useful, but we'd need to push all the features like encryption
and options for compression and such into the backend, in a way that
works for pgbackrest, to be able to leverage it, and I'm not sure that
would get much support or that it could be done in a way that doesn't
end up causing problems for pg_basebackup, which clearly wouldn't be
acceptable.  Further, if we can't leverage the PG backup protocol that
you're building here, it seems pretty darn unlikely we'd have much use
for the manifest that's built as part of that.

I'm probably going to lose what credibility I have in critizing what
you're doing with pg_basebackup here, but I started off saying you don't
have to make me happy and this is part of why- I really don't think
there's much that you're doing with pg_basebackup that is ultimately
going to impact what plans I have for the future, for pretty much
anything.  I haven't got any real specific plans around pg_basebackup,
though, point-in-fact, if you put in a bunch of code that shows how to
get PG and pg_basebackup to do block-level incremental backups in a safe
and trusted way, that would actually be *really* useful to the
pgbackrest project because we could then lift that logic out of
pg_basebackup and leverage it.  If I wanted to be entirely selfish, I'd
be pushing you to get block-level incremental backup into pg_basebackup
as quickly as possible so that we could have such an example of "how to
do it in a way that, if it breaks, the PG community will figure out what
went wrong and fix it".  If you look at other things we've done, such as
not backing up unlogged tables, that's exactly the approach we've used:
introduce the feature into pg_basebackup *first*, make sure the
community agrees that it's a valid approach and will deal with any
issues with it (and will take pains to avoid *breaking* it in future
versions..), and only *then* introduce it into pgbackrest by using the
same approach.  Those other features were well in-line with what makes
sense for pg_basebackup too though.

We haven't done that though, and I haven't been pushing in that
direction, not because I think it's a bad feature or that I want to
block something going into pg_basebackup or whatever, but because I
think it's actually going to cause more problems for users than it
solves because some users will want to use it (though not all, as we've
seen on this list, as there's at least some users out there who are as
scared of the idea of having *just* this in pg_basebackup without the
other things I talk about above as I am) and then they're going to try
and hack together all those other things they need around WAL management
and archiving and expiration and they're likely to get it wrong- perhaps
in obvious ways, perhaps in relatively subtle ways, but either way,
they'll end up with backups that aren't valid that they only discover
when they're in an emergency.  Again, perhaps selfish me would say "oh
good, then they'll call me and pay me lots to fix it for them", but it
certainly wouldn't look good for the community- even if all of the
documentation and everything we put out there says that they way they
were doing it had this subtle issue or whatever (considering our docs
still promote a really bad, imv anyway, archive command kinda makes this
likely, if you ask me anyway..), and it wouldn't be good for the user.

> > That's what would make *me* happy.  Even some comments about how to
> > *get* there while also working towards these features would be likely
> > to make me happy.  Instead, I feel like we're being told that we need
> > this feature badly in v13 and we're going to cut bait and do whatever
> > is necessary to get us there.
>
> This seems like a really unfair accusation given how much work I've
> put into trying to satisfy you and David. If this patch, the parallel
> full backup patch, and the incremental backup patch were all to get
> committed to v13, an outcome which seems pretty unlikely to me at this
> point, then you would have a very significant number of things that
> you have requested in the course of the various discussions, and
> AFAICS the only thing you'd have that you don't want is the need to
> parse the manifest file use while (<>) { @a = split /\t/, $_ } rather
> than $a = parse_json(join '', <>). You would, for example, have the
> ability to request an individual file from the server rather than a
> complete tarball. Maybe the command that requests a file would lack an
> encryption option, something which IIUC you would like to have, but
> that certainly does not leave you worse off. It is easier to add an
> encryption option to a command which you already have than it is to
> invent a whole new command -- or really several whole new commands,
> since such a command is not really usable unless you also have
> facilities to start and stop a backup through the replication
> protocol.

No, the manifest format is definitely not the only issue that I have
with this- but as it relates to the thread about building a manifest, my
complaint really is isolated to the format and just forward thinking
about how the format you're advocating for will mean custom code for who
knows how many different tools.  While I appreciate the offer to write
all the bespoke code for every version of the manifest for pgbackrest,
I'm really not thrilled about the idea of having to have that extra code
and having to then maintain it.  Yes, when you compare the single format
of the manifest and the code required for it against a JSON parser, if
we only ever have this one format then it'd win in terms of code, but I
don't believe it'll end up being one format, instead we're going to end
up with multiple formats, each of which will have some additional code
for dealing with parsing it, and that's going to add up.  That's also
going to, as I said before, make it almost certain that we can't use
older tools with newer backups.  These are issues that we've thought
about and worried about over the years of pgbackrest and with that
experience we've come down on the side that a JSON-based format would be
an altogether better design.  That's why we're advocating for it, not
because it requires more code or so that it delays the efforts here, but
because we've been there, we've used other formats, we've dealt with
user complaints when we do break things, this is all history for us
that's helped us learn- for PG, it looks like the future with a static
format, and I get that the future is hard to predict and pg_basebackup
isn't pgbackrest and yeah, I could be completely wrong because I don't
actually have a crystal ball, but this starting point sure looks really
familiar.

Thanks,

Stephen

Attachment

Re: backup manifests

From
David Steele
Date:
Hi Robert,

On 1/7/20 6:33 PM, Stephen Frost wrote:

 > These are issues that we've thought
 > about and worried about over the years of pgbackrest and with that
 > experience we've come down on the side that a JSON-based format would be
 > an altogether better design.  That's why we're advocating for it, not
 > because it requires more code or so that it delays the efforts here, but
 > because we've been there, we've used other formats, we've dealt with
 > user complaints when we do break things, this is all history for us
 > that's helped us learn- for PG, it looks like the future with a static
 > format, and I get that the future is hard to predict and pg_basebackup
 > isn't pgbackrest and yeah, I could be completely wrong because I don't
 > actually have a crystal ball, but this starting point sure looks really
 > familiar.

For example, have you considered what will happen if you have a file in 
the cluster with a tab in the name?  This is perfectly valid in Posix 
filesystems, at least.  You may already be escaping tabs but the simple 
code snippet you provided earlier isn't going to work so well either 
way.  It gets complicated quickly.

I know users should not be creating weird files in PGDATA, but it's 
amazing how often this sort of thing pops up.  We currently have an open 
issue because = in file names breaks our file format.  Tab is surely 
less common but it's amazing what users will do.

Another fun one is 03849840 which fixes the handling of \ characters in 
the code which checksums the manifest.  The file is not fully JSON but 
the checksums are and that was initially missed in the C migration.  The 
bug never got released but it easily could have been.

In short, using a quick-and-dirty homegrown format seemed great at first 
but has caused many headaches.  Because we don't change the repo format 
across releases we are kind of stuck with past sins until we create a 
new repo format and write update/compatability code.  Users are 
understandably concerned if new versions of the software won't work with 
their repo, some of which contain years of backups (really).

This doesn't even get into the work everyone else will need to do to 
read a custom format.  I do appreciate your offer of contributing parser 
code to pgBackRest, but honestly I'd rather it were not necessary. 
Though of course I'd still love to see a contribution of some sort from you!

Hard experience tells me that using a standard format where all these 
issues have been worked out is the way to go.

There are a few MIT-licensed JSON projects that are implemented in a 
single file.  cJSON is very capable while JSMN is very minimal. Is is 
possible that one of those (or something like it) would be acceptable? 
It looks like the one requirement we have is that the JSON can be 
streamed rather than just building up one big blob?  Even with that 
requirement there are a few tricks that can be used.  JSON nests rather 
nicely after all so the individual file records can be transmitted 
independently of the overall file format.

Your first question may be why didn't pgBackRest use one of those 
parsers?  The answer is that JSON parsing/rendering is pretty trivial. 
Memory management and a (datum-like) type system are the hard parts and 
pgBackRest already had those.

Would it be acceptable to bring in JSON code with a compatible license 
to use in libcommon?  If so I'm willing to help adapt that code for use 
in Postgres.  It's possible that the pgBackRest code could be adapted 
similarly, but it might make more sense to start from one of these 
general purpose parsers.

Thoughts?

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Robert Haas
Date:
On Thu, Jan 9, 2020 at 8:19 PM David Steele <david@pgmasters.net> wrote:
> For example, have you considered what will happen if you have a file in
> the cluster with a tab in the name?  This is perfectly valid in Posix
> filesystems, at least.

Yeah, there's code for that in the patch I posted. I don't think the
validator patch deals with it, but that's fixable.

> You may already be escaping tabs but the simple
> code snippet you provided earlier isn't going to work so well either
> way.  It gets complicated quickly.

Sure, but obviously neither of those code snippets were intended to be
used straight out of the box. Even after you parse the manifest as
JSON, you would still - if you really want to validate it - check that
you have the keys and values you expect, that the individual field
values are sensible, etc. I still stand by my earlier contention that,
as things stand today, you can parse an ad-hoc format in less code
than a JSON format. If we had a JSON parser available on the front
end, I think it'd be roughly comparable, but maybe the JSON format
would come out a bit ahead. Not sure.

> There are a few MIT-licensed JSON projects that are implemented in a
> single file.  cJSON is very capable while JSMN is very minimal. Is is
> possible that one of those (or something like it) would be acceptable?
> It looks like the one requirement we have is that the JSON can be
> streamed rather than just building up one big blob?  Even with that
> requirement there are a few tricks that can be used.  JSON nests rather
> nicely after all so the individual file records can be transmitted
> independently of the overall file format.

I haven't really looked at these. I would have expected that including
a second JSON parser in core would provoke significant opposition.
Generally, people dislike having more than one piece of code to do the
same thing. I would also expect that depending on an external package
would provoke significant opposition. If we suck the code into core,
then we have to keep it up to date with the upstream, which is a
significant maintenance burden - look at all the time Tom has spent on
snowball, regex, and time zone code over the years. If we don't suck
the code into core but depend on it, then every developer needs to
have that package installed on their operating system, and every
packager has to make sure that it is being built for their OS so that
PostgreSQL can depend on it. Perhaps JSON is so popular today that
imposing such a requirement would provoke only a groundswell of
support, but based on past precedent I would assume that if I
committed a patch of this sort the chances that I'd have to revert it
would be about 99.9%. Optional dependencies for optional features are
usually pretty well-tolerated when they're clearly necessary: e.g. you
can't really do JIT without depending on something like LLVM, but the
bar for a mandatory dependency has historically been quite high.

> Would it be acceptable to bring in JSON code with a compatible license
> to use in libcommon?  If so I'm willing to help adapt that code for use
> in Postgres.  It's possible that the pgBackRest code could be adapted
> similarly, but it might make more sense to start from one of these
> general purpose parsers.

For the reasons above, I expect this approach would be rejected, by
Tom and by others.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> ... I would also expect that depending on an external package
> would provoke significant opposition. If we suck the code into core,
> then we have to keep it up to date with the upstream, which is a
> significant maintenance burden - look at all the time Tom has spent on
> snowball, regex, and time zone code over the years.

Also worth noting is that we have a seriously bad track record about
choosing external packages to depend on.  The regex code has no upstream
maintainer anymore (well, the Tcl guys seem to think that *we* are
upstream for that now), and snowball is next door to moribund.
With C not being a particularly hip language to develop in anymore,
it wouldn't surprise me in the least for any C-code JSON parser
we might pick to go dead pretty soon.

Between that problem and the likelihood that we'd need to make
significant code changes anyway to meet our own coding style etc
expectations, I think really we'd have to assume that we're going
to fork and maintain our own copy of any code we pick.

Now, if it's a small enough chunk of code (and really, how complex
is JSON parsing anyway) maybe that doesn't matter.  But I tend to
agree with Robert's position that it's a big ask for this patch
to introduce a frontend JSON parser.

            regards, tom lane



Re: backup manifests

From
David Fetter
Date:
On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > ... I would also expect that depending on an external package
> > would provoke significant opposition. If we suck the code into core,
> > then we have to keep it up to date with the upstream, which is a
> > significant maintenance burden - look at all the time Tom has spent on
> > snowball, regex, and time zone code over the years.
> 
> Also worth noting is that we have a seriously bad track record about
> choosing external packages to depend on.  The regex code has no upstream
> maintainer anymore (well, the Tcl guys seem to think that *we* are
> upstream for that now), and snowball is next door to moribund.
> With C not being a particularly hip language to develop in anymore,
> it wouldn't surprise me in the least for any C-code JSON parser
> we might pick to go dead pretty soon.

Given jq's extreme popularity and compatible license, I'd nominate that.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* David Fetter (david@fetter.org) wrote:
> On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > ... I would also expect that depending on an external package
> > > would provoke significant opposition. If we suck the code into core,
> > > then we have to keep it up to date with the upstream, which is a
> > > significant maintenance burden - look at all the time Tom has spent on
> > > snowball, regex, and time zone code over the years.
> >
> > Also worth noting is that we have a seriously bad track record about
> > choosing external packages to depend on.  The regex code has no upstream
> > maintainer anymore (well, the Tcl guys seem to think that *we* are
> > upstream for that now), and snowball is next door to moribund.
> > With C not being a particularly hip language to develop in anymore,
> > it wouldn't surprise me in the least for any C-code JSON parser
> > we might pick to go dead pretty soon.
>
> Given jq's extreme popularity and compatible license, I'd nominate that.

I don't think that really changes Tom's concerns here about having an
"upstream" for this.

For my part, I don't really agree with the whole "we don't want two
different JSON parsers" when we've got two of a bunch of stuff between
the frontend and the backend, particularly since I don't really think
it'll end up being *that* much code.

My thought, which I had expressed to David (though he obviously didn't
entirely agree with me since he suggested the other options), was to
adapt the pgBackRest JSON parser, which isn't really all that much code.

Frustratingly, that code has got some internal pgBackRest dependency on
things like the memory context system (which looks, unsurprisingly, an
awful lot like what is in PG backend), the error handling and logging
systems (which are different from PG because they're quite intentionally
segregated from each other- something PG would benefit from, imv..), and
Variadics (known in the PG backend as Datums, and quite similar to
them..).

Even so, David's offered to adjust the code to use the frontend's memory
management (*cough* malloc()..), and error handling/logging, and he had
some idea for Variadics (or maybe just pulling the backend's Datum
system in..?  He could answer better), and basically write a frontend
JSON parser for PG without too much code, no external dependencies, and
to make sure it answers this requirement, and I've agreed that he can
spend some time on that instead of pgBackRest to get us through this, if
everyone else is agreeable to the idea.  Obviously this isn't intended
to box anyone in- if there turns out even after the code's been written
to be some fatal issue with using it, so be it, but we're offering to
help.

Thanks,

Stephen

Attachment

Re: backup manifests

From
David Fetter
Date:
On Tue, Jan 14, 2020 at 03:35:40PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * David Fetter (david@fetter.org) wrote:
> > On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> > > Robert Haas <robertmhaas@gmail.com> writes:
> > > > ... I would also expect that depending on an external package
> > > > would provoke significant opposition. If we suck the code into core,
> > > > then we have to keep it up to date with the upstream, which is a
> > > > significant maintenance burden - look at all the time Tom has spent on
> > > > snowball, regex, and time zone code over the years.
> > > 
> > > Also worth noting is that we have a seriously bad track record about
> > > choosing external packages to depend on.  The regex code has no upstream
> > > maintainer anymore (well, the Tcl guys seem to think that *we* are
> > > upstream for that now), and snowball is next door to moribund.
> > > With C not being a particularly hip language to develop in anymore,
> > > it wouldn't surprise me in the least for any C-code JSON parser
> > > we might pick to go dead pretty soon.
> > 
> > Given jq's extreme popularity and compatible license, I'd nominate that.
> 
> I don't think that really changes Tom's concerns here about having an
> "upstream" for this.
> 
> For my part, I don't really agree with the whole "we don't want two
> different JSON parsers" when we've got two of a bunch of stuff between
> the frontend and the backend, particularly since I don't really think
> it'll end up being *that* much code.
> 
> My thought, which I had expressed to David (though he obviously didn't
> entirely agree with me since he suggested the other options), was to
> adapt the pgBackRest JSON parser, which isn't really all that much code.
> 
> Frustratingly, that code has got some internal pgBackRest dependency on
> things like the memory context system (which looks, unsurprisingly, an
> awful lot like what is in PG backend), the error handling and logging
> systems (which are different from PG because they're quite intentionally
> segregated from each other- something PG would benefit from, imv..), and
> Variadics (known in the PG backend as Datums, and quite similar to
> them..).

It might be more fun to put in that infrastructure and have it gate
the manifest feature than to have two vastly different parsers to
contend with. I get that putting off the backup manifests isn't an
awesome prospect, but neither is rushing them in and getting them
wrong in ways we'll still be regretting a decade hence.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: backup manifests

From
David Steele
Date:
Hi Stephen,

On 1/14/20 1:35 PM, Stephen Frost wrote:
> 
> My thought, which I had expressed to David (though he obviously didn't
> entirely agree with me since he suggested the other options), was to
> adapt the pgBackRest JSON parser, which isn't really all that much code.

It's not that I didn't agree, it's just that the pgBackRest code does 
use mem contexts, the type system, etc.  After looking at some other 
solutions with similar amounts of code I thought they might be more 
acceptable.  At least it seemed like a good idea to throw it out there.

> Even so, David's offered to adjust the code to use the frontend's memory
> management (*cough* malloc()..), and error handling/logging, and he had
> some idea for Variadics (or maybe just pulling the backend's Datum
> system in..?  He could answer better), and basically write a frontend
> JSON parser for PG without too much code, no external dependencies, and
> to make sure it answers this requirement, and I've agreed that he can
> spend some time on that instead of pgBackRest to get us through this, if
> everyone else is agreeable to the idea.  

To keep it simple I think we are left with callbacks or a somewhat 
static "what's the next datum" kind of approach.  I think the latter 
could get us through a release or two while we make improvements.

> Obviously this isn't intended
> to box anyone in- if there turns out even after the code's been written
> to be some fatal issue with using it, so be it, but we're offering to
> help.

I'm happy to work up a prototype unless the consensus is that we 
absolutely don't want a second JSON parser in core.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Tom Lane
Date:
David Steele <david@pgmasters.net> writes:
> I'm happy to work up a prototype unless the consensus is that we 
> absolutely don't want a second JSON parser in core.

How much code are we talking about?  If the answer is "a few hundred
lines", it's a lot easier to swallow than if it's "a few thousand".

            regards, tom lane



Re: backup manifests

From
David Steele
Date:
Hi Tom,

On 1/14/20 9:47 PM, Tom Lane wrote:
> David Steele <david@pgmasters.net> writes:
>> I'm happy to work up a prototype unless the consensus is that we
>> absolutely don't want a second JSON parser in core.
> 
> How much code are we talking about?  If the answer is "a few hundred
> lines", it's a lot easier to swallow than if it's "a few thousand".

It's currently about a thousand lines but we have a lot of functions to 
convert to/from specific types.  I imagine the line count would be 
similar using one of the approaches I discussed above.

Current source attached for reference.

Regards,
-- 
-David
david@pgmasters.net

Attachment

Re: backup manifests

From
Bruce Momjian
Date:
On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> Also worth noting is that we have a seriously bad track record about
> choosing external packages to depend on.  The regex code has no upstream
> maintainer anymore (well, the Tcl guys seem to think that *we* are
> upstream for that now), and snowball is next door to moribund.
> With C not being a particularly hip language to develop in anymore,
> it wouldn't surprise me in the least for any C-code JSON parser
> we might pick to go dead pretty soon.
> 
> Between that problem and the likelihood that we'd need to make
> significant code changes anyway to meet our own coding style etc
> expectations, I think really we'd have to assume that we're going
> to fork and maintain our own copy of any code we pick.
> 
> Now, if it's a small enough chunk of code (and really, how complex
> is JSON parsing anyway) maybe that doesn't matter.  But I tend to
> agree with Robert's position that it's a big ask for this patch
> to introduce a frontend JSON parser.

I know we have talked about our experience in maintaining external code:

*  TCL regex
*  Snowball
*  Timezone handling

However, the regex code is complex, and the Snowball and timezone code
is improved as they add new languages and time zones.  I don't see JSON
parsing as complex or likely to change much, so it might be acceptable
to include it in our frontend code.

As far as using tab-delimited data, I know this usage was compared to
postgresql.conf and pg_hba.conf, which don't change much.  However,
those files are not usually written, and do not contain user data, while
the backup file might contain user-specified paths if they are not just
relative to the PGDATA directory, and that would make escaping a
requirement.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: backup manifests

From
Robert Haas
Date:
On Fri, Jan 3, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> Thank you for review comments.

Here's a new patch set for this feature.

0001 adds checksum helper functions, similar to what Suraj had
incorporated into my original patch but separated out into a separate
patch and with some different aesthetic decisions. I also decided to
support all of the SHA variants that PG knows about as options and
added a function to parse a checksum algorithm name, along the lines I
suggested previously.

0002 teaches the server to generate a backup manifest using the format
I originally proposed. This is similar to the patch I posted
previously, but it spools the manifest to disk as it's being
generated, so that we don't run the server out of memory or fail when
hitting the 1GB allocation limit.

0003 adds a new utility, pg_validatebackup, to validate a backup
against a manifest. Suraj tried to incorporate this into
pg_basebackup, which I initially thought might be OK but eventually
decided wasn't good, partly because this really wants to take some
command-line options entirely unrelated to the options accepted by
pg_basebackup. I tried to improve the error checking and the order in
which various things are done, too. This is a basically a complete
rewrite as compared with Suraj's version.

0004 modifies the server to generate a backup manifest in JSON format
rather than my originally proposed format. This allows for some
comparison of the code doing it one way vs. the other. Assuming we
stick with JSON, I will squash this with 0002 at some point.

0005 is a very much work-in-progress and proof-of-concept to modify
the backup validator to understand the JSON format. It doesn't
validate the manifest checksum at this point; it just prints it out.
The error handling needs work. It has other problems, and bugs.
Although I'm still not very happy about the idea of using JSON here,
I'm pretty happy with the basic approach this patch takes. It
demonstrates that the JSON parser can be used for non-trivial things
in frontend code, and I'd say the code even looks reasonably clean -
with the exception of small details like being buggy and
under-commented.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
tushar
Date:
On 2/27/20 9:22 PM, Robert Haas wrote:
> Here's a new patch set for this feature.

Thanks Robert.  After applying all the 5 patches (v8-00*) against PG v13 
(commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) ,

There are few issues/observations

1)Getting segmentation fault error if  we try pg_validatebackup against  
a valid backup_manifest file but data directory path is WRONG

[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D bk 
--manifest-checksums=sha224

[centos@tushar-ldap-docker bin]$ cp bk/backup_manifest /tmp/.

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
/tmp/backup_manifest    random_directory/
pg_validatebackup: * manifest_checksum = 
f0460cd6aa13cf0c5e35426a41af940a9231e6425cd65115a19778b7abfdaef9
pg_validatebackup: error: could not open directory "random_directory": 
No such file or directory
Segmentation fault

2) when used '-R' option at the time of create base backup

[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D bar -R
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup  bar
pg_validatebackup: * manifest_checksum = 
a195d3a3a82a41200c9ac92c12d764d23c810e7e91b31c44a7d04f67ce012edc
pg_validatebackup: error: "standby.signal" is present on disk but not in 
the manifest
pg_validatebackup: error: "postgresql.auto.conf" has size 286 on disk 
but size 88 in the manifest
[centos@tushar-ldap-docker bin]$

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: backup manifests

From
tushar
Date:
On 3/3/20 4:04 PM, tushar wrote:
> Thanks Robert.  After applying all the 5 patches (v8-00*) against PG 
> v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 

There is a scenario where pg_validatebackup is not throwing an error if 
some file deleted from pg_wal/ folder and  but later at the time of 
restoring - we are getting an error

[centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1

[centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
000000010000000000000010  archive_status

[centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
pg_validatebackup: * manifest_checksum = 
88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
pg_validatebackup: backup successfully verified

[centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
waiting for server to start....2020-03-02 20:05:22.732 IST [21441] LOG:  
starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc 
(GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address 
"::1", port 3333
2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address 
"127.0.0.1", port 3333
2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.3333"
2020-03-02 20:05:22.739 IST [21442] LOG:  database system was 
interrupted; last known up at 2020-03-02 20:04:35 IST
2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL directory 
"pg_wal/archive_status"
2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate required 
checkpoint record
2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from a 
backup, touch 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" and 
add required recovery options.
     If you are not restoring from a backup, try removing the file 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
     Be careful: removing 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will 
result in a corrupt cluster if restoring from a backup.
2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 21442) 
exited with exit code 1
2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to 
startup process failure
2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
  stopped waiting
pg_ctl: could not start server
Examine the log output.
[centos@tushar-ldap-docker bin]$

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: backup manifests

From
tushar
Date:
Hi,
Another observation , if i change the ownership of a file which is under 
global/ directory
i.e

[root@tushar-ldap-docker global]# chown enterprisedb 2396

and run the pg_validatebackup command, i am getting this message -

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup gggg
pg_validatebackup: * manifest_checksum = 
e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
pg_validatebackup: error: could not open file "global/2396": Permission 
denied
*** Error in `./pg_validatebackup': double free or corruption (!prev): 
0x0000000001850ba0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
./pg_validatebackup[0x401f4c]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
./pg_validatebackup[0x402049]
======= Memory map: ========
00400000-00415000 r-xp 00000000 fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
00614000-00615000 r--p 00014000 fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
00615000-00616000 rw-p 00015000 fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
017f3000-01878000 rw-p 00000000 00:00 0                                  
[heap]
7fa218000000-7fa218021000 rw-p 00000000 00:00 0
7fa218021000-7fa21c000000 ---p 00000000 00:00 0
7fa21e122000-7fa21e137000 r-xp 00000000 fd:03 141697                     
/usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e137000-7fa21e336000 ---p 00015000 fd:03 141697                     
/usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e336000-7fa21e337000 r--p 00014000 fd:03 141697                     
/usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 141697                     
/usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e338000-7fa224862000 r--p 00000000 fd:03 266442                     
/usr/lib/locale/locale-archive
7fa224862000-7fa224a25000 r-xp 00000000 fd:03 134456                     
/usr/lib64/libc-2.17.so
7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 134456                     
/usr/lib64/libc-2.17.so
7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 134456                     
/usr/lib64/libc-2.17.so
7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 134456                     
/usr/lib64/libc-2.17.so
7fa224c2b000-7fa224c30000 rw-p 00000000 00:00 0
7fa224c30000-7fa224c47000 r-xp 00000000 fd:03 134485                     
/usr/lib64/libpthread-2.17.so
7fa224c47000-7fa224e46000 ---p 00017000 fd:03 134485                     
/usr/lib64/libpthread-2.17.so
7fa224e46000-7fa224e47000 r--p 00016000 fd:03 134485                     
/usr/lib64/libpthread-2.17.so
7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 134485                     
/usr/lib64/libpthread-2.17.so
7fa224e48000-7fa224e4c000 rw-p 00000000 00:00 0
7fa224e4c000-7fa224e90000 r-xp 00000000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa224e90000-7fa225090000 ---p 00044000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225090000-7fa225093000 r--p 00044000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225094000-7fa2250b6000 r-xp 00000000 fd:03 130333                     
/usr/lib64/ld-2.17.so
7fa22527d000-7fa2252a2000 rw-p 00000000 00:00 0
7fa2252b3000-7fa2252b5000 rw-p 00000000 00:00 0
7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 130333                     
/usr/lib64/ld-2.17.so
7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 130333                     
/usr/lib64/ld-2.17.so
7fa2252b7000-7fa2252b8000 rw-p 00000000 00:00 0
7ffdf354f000-7ffdf3570000 rw-p 00000000 00:00 0                          
[stack]
7ffdf3572000-7ffdf3574000 r-xp 00000000 00:00 0                          
[vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  
[vsyscall]
Aborted
[centos@tushar-ldap-docker bin]$


I am getting the error message but along with "*** Error in 
`./pg_validatebackup': double free or corruption (!prev): 
0x0000000001850ba0 ***"  messages

Is this expected ?

regards,

On 3/3/20 8:19 PM, tushar wrote:
> On 3/3/20 4:04 PM, tushar wrote:
>> Thanks Robert.  After applying all the 5 patches (v8-00*) against PG 
>> v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 
>
> There is a scenario where pg_validatebackup is not throwing an error 
> if some file deleted from pg_wal/ folder and  but later at the time of 
> restoring - we are getting an error
>
> [centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1
>
> [centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
> 000000010000000000000010  archive_status
>
> [centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
> pg_validatebackup: * manifest_checksum = 
> 88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
> pg_validatebackup: backup successfully verified
>
> [centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
> waiting for server to start....2020-03-02 20:05:22.732 IST [21441] 
> LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by 
> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address 
> "::1", port 3333
> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address 
> "127.0.0.1", port 3333
> 2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket 
> "/tmp/.s.PGSQL.3333"
> 2020-03-02 20:05:22.739 IST [21442] LOG:  database system was 
> interrupted; last known up at 2020-03-02 20:04:35 IST
> 2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL 
> directory "pg_wal/archive_status"
> 2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
> 2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate required 
> checkpoint record
> 2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from a 
> backup, touch 
> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" and 
> add required recovery options.
>     If you are not restoring from a backup, try removing the file 
> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
>     Be careful: removing 
> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will 
> result in a corrupt cluster if restoring from a backup.
> 2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 21442) 
> exited with exit code 1
> 2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to 
> startup process failure
> 2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
>  stopped waiting
> pg_ctl: could not start server
> Examine the log output.
> [centos@tushar-ldap-docker bin]$
>

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: backup manifests

From
tushar
Date:
Another scenario, in which if we modify Manifest-Checksum" value from 
backup_manifest file , we are not getting an error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = 
28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
pg_validatebackup: backup successfully verified

open backup_manifest file and replace

"Manifest-Checksum": 
"8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
with
"Manifest-Checksum": "Hello World"}

rerun the pg_validatebackup

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = Hello World
pg_validatebackup: backup successfully verified

regards,

On 3/4/20 3:26 PM, tushar wrote:
> Hi,
> Another observation , if i change the ownership of a file which is 
> under global/ directory
> i.e
>
> [root@tushar-ldap-docker global]# chown enterprisedb 2396
>
> and run the pg_validatebackup command, i am getting this message -
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup gggg
> pg_validatebackup: * manifest_checksum = 
> e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
> pg_validatebackup: error: could not open file "global/2396": 
> Permission denied
> *** Error in `./pg_validatebackup': double free or corruption (!prev): 
> 0x0000000001850ba0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
> ./pg_validatebackup[0x401f4c]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
> ./pg_validatebackup[0x402049]
> ======= Memory map: ========
> 00400000-00415000 r-xp 00000000 fd:03 4044545 
> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
> 00614000-00615000 r--p 00014000 fd:03 4044545 
> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
> 00615000-00616000 rw-p 00015000 fd:03 4044545 
> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
> 017f3000-01878000 rw-p 00000000 00:00 
> 0                                  [heap]
> 7fa218000000-7fa218021000 rw-p 00000000 00:00 0
> 7fa218021000-7fa21c000000 ---p 00000000 00:00 0
> 7fa21e122000-7fa21e137000 r-xp 00000000 fd:03 
> 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
> 7fa21e137000-7fa21e336000 ---p 00015000 fd:03 
> 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
> 7fa21e336000-7fa21e337000 r--p 00014000 fd:03 
> 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
> 7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 
> 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
> 7fa21e338000-7fa224862000 r--p 00000000 fd:03 
> 266442                     /usr/lib/locale/locale-archive
> 7fa224862000-7fa224a25000 r-xp 00000000 fd:03 
> 134456                     /usr/lib64/libc-2.17.so
> 7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 
> 134456                     /usr/lib64/libc-2.17.so
> 7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 
> 134456                     /usr/lib64/libc-2.17.so
> 7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 
> 134456                     /usr/lib64/libc-2.17.so
> 7fa224c2b000-7fa224c30000 rw-p 00000000 00:00 0
> 7fa224c30000-7fa224c47000 r-xp 00000000 fd:03 
> 134485                     /usr/lib64/libpthread-2.17.so
> 7fa224c47000-7fa224e46000 ---p 00017000 fd:03 
> 134485                     /usr/lib64/libpthread-2.17.so
> 7fa224e46000-7fa224e47000 r--p 00016000 fd:03 
> 134485                     /usr/lib64/libpthread-2.17.so
> 7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 
> 134485                     /usr/lib64/libpthread-2.17.so
> 7fa224e48000-7fa224e4c000 rw-p 00000000 00:00 0
> 7fa224e4c000-7fa224e90000 r-xp 00000000 fd:03 4044478 
> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
> 7fa224e90000-7fa225090000 ---p 00044000 fd:03 4044478 
> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
> 7fa225090000-7fa225093000 r--p 00044000 fd:03 4044478 
> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
> 7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 
> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
> 7fa225094000-7fa2250b6000 r-xp 00000000 fd:03 
> 130333                     /usr/lib64/ld-2.17.so
> 7fa22527d000-7fa2252a2000 rw-p 00000000 00:00 0
> 7fa2252b3000-7fa2252b5000 rw-p 00000000 00:00 0
> 7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 
> 130333                     /usr/lib64/ld-2.17.so
> 7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 
> 130333                     /usr/lib64/ld-2.17.so
> 7fa2252b7000-7fa2252b8000 rw-p 00000000 00:00 0
> 7ffdf354f000-7ffdf3570000 rw-p 00000000 00:00 
> 0                          [stack]
> 7ffdf3572000-7ffdf3574000 r-xp 00000000 00:00 
> 0                          [vdso]
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 
> 0                  [vsyscall]
> Aborted
> [centos@tushar-ldap-docker bin]$
>
>
> I am getting the error message but along with "*** Error in 
> `./pg_validatebackup': double free or corruption (!prev): 
> 0x0000000001850ba0 ***"  messages
>
> Is this expected ?
>
> regards,
>
> On 3/3/20 8:19 PM, tushar wrote:
>> On 3/3/20 4:04 PM, tushar wrote:
>>> Thanks Robert.  After applying all the 5 patches (v8-00*) against PG 
>>> v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 
>>
>> There is a scenario where pg_validatebackup is not throwing an error 
>> if some file deleted from pg_wal/ folder and  but later at the time 
>> of restoring - we are getting an error
>>
>> [centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1
>>
>> [centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
>> 000000010000000000000010  archive_status
>>
>> [centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*
>>
>> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
>> pg_validatebackup: * manifest_checksum = 
>> 88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
>> pg_validatebackup: backup successfully verified
>>
>> [centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
>> waiting for server to start....2020-03-02 20:05:22.732 IST [21441] 
>> LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by 
>> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
>> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address 
>> "::1", port 3333
>> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address 
>> "127.0.0.1", port 3333
>> 2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket 
>> "/tmp/.s.PGSQL.3333"
>> 2020-03-02 20:05:22.739 IST [21442] LOG:  database system was 
>> interrupted; last known up at 2020-03-02 20:04:35 IST
>> 2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL 
>> directory "pg_wal/archive_status"
>> 2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
>> 2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate required 
>> checkpoint record
>> 2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from 
>> a backup, touch 
>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" and 
>> add required recovery options.
>>     If you are not restoring from a backup, try removing the file 
>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
>>     Be careful: removing 
>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will 
>> result in a corrupt cluster if restoring from a backup.
>> 2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 21442) 
>> exited with exit code 1
>> 2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to 
>> startup process failure
>> 2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
>>  stopped waiting
>> pg_ctl: could not start server
>> Examine the log output.
>> [centos@tushar-ldap-docker bin]$
>>
>

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: backup manifests

From
tushar
Date:
Hi,

There is a scenario in which i add something inside the pg_tablespace directory , i am getting an error like-

pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/PG_13_202002271/test" is present on disk but not in the manifest

but if i remove 'PG_13_202002271 ' directory then there is no error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verified

Steps to reproduce -
--connect to psql terminal   , create a tablespace
postgres=# \! mkdir /tmp/my_tblspc
postgres=# create tablespace tbs location '/tmp/my_tblspc';
CREATE TABLESPACE
postgres=# \q

--run pg_basebackup
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D data_dir   -T /tmp/my_tblspc/=/tmp/new_my_tblspc
[centos@tushar-ldap-docker bin]$
[centos@tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/
PG_13_202002271

--create a new file under PG_13_* folder
[centos@tushar-ldap-docker bin]$ touch  /tmp/new_my_tblspc/PG_13_202002271/test
[centos@tushar-ldap-docker bin]$

--run pg_validatebackup ,Getting an error which looks expected
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: error: "pg_tblspc/16386/PG_13_202002271/test" is present on disk but not in the manifest
[centos@tushar-ldap-docker bin]$

--remove the added file
[centos@tushar-ldap-docker bin]$ rm -rf   /tmp/new_my_tblspc/PG_13_202002271/test

--run pg_validatebackup , working fine
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

--remove the folder PG_13*
[centos@tushar-ldap-docker bin]$ rm -rf   /tmp/new_my_tblspc/PG_13_202002271/
[centos@tushar-ldap-docker bin]$
[centos@tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/

--run pg_validatebackup ,   No error reported  ?
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

Start the server -

[centos@tushar-ldap-docker bin]$ ./pg_ctl -D data_dir/ start -o '-p 9033'
waiting for server to start....2020-03-04 19:18:54.839 IST [13097] LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv6 address "::1", port 9033
2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv4 address "127.0.0.1", port 9033
2020-03-04 19:18:54.842 IST [13097] LOG:  listening on Unix socket "/tmp/.s.PGSQL.9033"
2020-03-04 19:18:54.843 IST [13097] LOG:  could not open directory "pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.845 IST [13098] LOG:  database system was interrupted; last known up at 2020-03-04 19:14:50 IST
2020-03-04 19:18:54.937 IST [13098] LOG:  could not open directory "pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.939 IST [13098] LOG:  could not open directory "pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.939 IST [13098] LOG:  redo starts at 0/18000028
2020-03-04 19:18:54.939 IST [13098] LOG:  consistent recovery state reached at 0/18000100
2020-03-04 19:18:54.939 IST [13098] LOG:  redo done at 0/18000100
2020-03-04 19:18:54.941 IST [13098] LOG:  could not open directory "pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.984 IST [13097] LOG:  database system is ready to accept connections
 done
server started
[centos@tushar-ldap-docker bin]$

regards,

On 3/4/20 3:51 PM, tushar wrote:
Another scenario, in which if we modify Manifest-Checksum" value from backup_manifest file , we are not getting an error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = 28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
pg_validatebackup: backup successfully verified

open backup_manifest file and replace

"Manifest-Checksum": "8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
with
"Manifest-Checksum": "Hello World"}

rerun the pg_validatebackup

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = Hello World
pg_validatebackup: backup successfully verified

regards,

On 3/4/20 3:26 PM, tushar wrote:
Hi,
Another observation , if i change the ownership of a file which is under global/ directory
i.e

[root@tushar-ldap-docker global]# chown enterprisedb 2396

and run the pg_validatebackup command, i am getting this message -

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup gggg
pg_validatebackup: * manifest_checksum = e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
pg_validatebackup: error: could not open file "global/2396": Permission denied
*** Error in `./pg_validatebackup': double free or corruption (!prev): 0x0000000001850ba0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
./pg_validatebackup[0x401f4c]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
./pg_validatebackup[0x402049]
======= Memory map: ========
00400000-00415000 r-xp 00000000 fd:03 4044545 /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
00614000-00615000 r--p 00014000 fd:03 4044545 /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
00615000-00616000 rw-p 00015000 fd:03 4044545 /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
017f3000-01878000 rw-p 00000000 00:00 0                                  [heap]
7fa218000000-7fa218021000 rw-p 00000000 00:00 0
7fa218021000-7fa21c000000 ---p 00000000 00:00 0
7fa21e122000-7fa21e137000 r-xp 00000000 fd:03 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e137000-7fa21e336000 ---p 00015000 fd:03 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e336000-7fa21e337000 r--p 00014000 fd:03 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e338000-7fa224862000 r--p 00000000 fd:03 266442                     /usr/lib/locale/locale-archive
7fa224862000-7fa224a25000 r-xp 00000000 fd:03 134456                     /usr/lib64/libc-2.17.so
7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 134456                     /usr/lib64/libc-2.17.so
7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 134456                     /usr/lib64/libc-2.17.so
7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 134456                     /usr/lib64/libc-2.17.so
7fa224c2b000-7fa224c30000 rw-p 00000000 00:00 0
7fa224c30000-7fa224c47000 r-xp 00000000 fd:03 134485                     /usr/lib64/libpthread-2.17.so
7fa224c47000-7fa224e46000 ---p 00017000 fd:03 134485                     /usr/lib64/libpthread-2.17.so
7fa224e46000-7fa224e47000 r--p 00016000 fd:03 134485                     /usr/lib64/libpthread-2.17.so
7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 134485                     /usr/lib64/libpthread-2.17.so
7fa224e48000-7fa224e4c000 rw-p 00000000 00:00 0
7fa224e4c000-7fa224e90000 r-xp 00000000 fd:03 4044478 /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa224e90000-7fa225090000 ---p 00044000 fd:03 4044478 /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225090000-7fa225093000 r--p 00044000 fd:03 4044478 /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225094000-7fa2250b6000 r-xp 00000000 fd:03 130333                     /usr/lib64/ld-2.17.so
7fa22527d000-7fa2252a2000 rw-p 00000000 00:00 0
7fa2252b3000-7fa2252b5000 rw-p 00000000 00:00 0
7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 130333                     /usr/lib64/ld-2.17.so
7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 130333                     /usr/lib64/ld-2.17.so
7fa2252b7000-7fa2252b8000 rw-p 00000000 00:00 0
7ffdf354f000-7ffdf3570000 rw-p 00000000 00:00 0                          [stack]
7ffdf3572000-7ffdf3574000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted
[centos@tushar-ldap-docker bin]$


I am getting the error message but along with "*** Error in `./pg_validatebackup': double free or corruption (!prev): 0x0000000001850ba0 ***"  messages

Is this expected ?

regards,

On 3/3/20 8:19 PM, tushar wrote:
On 3/3/20 4:04 PM, tushar wrote:
Thanks Robert.  After applying all the 5 patches (v8-00*) against PG v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) ,

There is a scenario where pg_validatebackup is not throwing an error if some file deleted from pg_wal/ folder and  but later at the time of restoring - we are getting an error

[centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1

[centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
000000010000000000000010  archive_status

[centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
pg_validatebackup: * manifest_checksum = 88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
pg_validatebackup: backup successfully verified

[centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
waiting for server to start....2020-03-02 20:05:22.732 IST [21441] LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address "::1", port 3333
2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address "127.0.0.1", port 3333
2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket "/tmp/.s.PGSQL.3333"
2020-03-02 20:05:22.739 IST [21442] LOG:  database system was interrupted; last known up at 2020-03-02 20:04:35 IST
2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL directory "pg_wal/archive_status"
2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate required checkpoint record
2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from a backup, touch "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" and add required recovery options.
    If you are not restoring from a backup, try removing the file "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
    Be careful: removing "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will result in a corrupt cluster if restoring from a backup.
2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 21442) exited with exit code 1
2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to startup process failure
2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[centos@tushar-ldap-docker bin]$




-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: backup manifests

From
Suraj Kharage
Date:


On Wed, Mar 4, 2020 at 3:51 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
Another scenario, in which if we modify Manifest-Checksum" value from
backup_manifest file , we are not getting an error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum =
28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
pg_validatebackup: backup successfully verified

open backup_manifest file and replace

"Manifest-Checksum":
"8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
with
"Manifest-Checksum": "Hello World"}

rerun the pg_validatebackup

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = Hello World
pg_validatebackup: backup successfully verified

regards,
 
Yeah, This handling is missing in the provided WIP patch. I believe Robert will consider this fixing in upcoming version of validator patch.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: backup manifests

From
Suraj Kharage
Date:

On Wed, Mar 4, 2020 at 7:21 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
Hi,

There is a scenario in which i add something inside the pg_tablespace directory , i am getting an error like-

pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/PG_13_202002271/test" is present on disk but not in the manifest

but if i remove 'PG_13_202002271 ' directory then there is no error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verified


This seems expected considering current design as we don't log the directory entries in backup_manifest. In your case, you have tablespace with no objects (empty tablespace) then backup_manifest does not have any entry for this hence when you remove this tablespace directory, validator could not detect it.

We can either document it or add the entry for directories in the manifest. Robert may have a better idea on this.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: backup manifests

From
Rajkumar Raghuwanshi
Date:
Hi,

In a negative test scenario, if I changed size to -1 in backup_manifest, pg_validatebackup giving
error with a random size number.

[edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold --manifest-checksum 'SHA256'
[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: backup successfully verified

--change a file size to -1 and generate new checksum.
[edb@localhost bin]$ vi /tmp/bold/backup_manifest
[edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a  /tmp/bold/backup_manifest
[edb@localhost bin]$ vi /tmp/bold/backup_manifest

[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: error: "global/4183" has size 0 on disk but size 18446744073709551615 in the manifest

Thanks & Regards,
Rajkumar Raghuwanshi


On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:

On Wed, Mar 4, 2020 at 7:21 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
Hi,

There is a scenario in which i add something inside the pg_tablespace directory , i am getting an error like-

pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/PG_13_202002271/test" is present on disk but not in the manifest

but if i remove 'PG_13_202002271 ' directory then there is no error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verified


This seems expected considering current design as we don't log the directory entries in backup_manifest. In your case, you have tablespace with no objects (empty tablespace) then backup_manifest does not have any entry for this hence when you remove this tablespace directory, validator could not detect it.

We can either document it or add the entry for directories in the manifest. Robert may have a better idea on this.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: backup manifests

From
tushar
Date:
Hi,

There is one scenario  where  i somehow able to run pg_validatebackup successfully but when i tried to start the server , it is failing

Steps to reproduce -
--create 2 base backup directory
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db1
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db2

--run pg_validatebackup , use backup_manifest of db1 directory against  db2/  . Will get an error
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
pg_validatebackup: error: checksum mismatch for file "backup_label"
 
--copy the backup_level of db1 to db2 folder
[centos@tushar-ldap-docker bin]$ cp db1/backup_label db2/.

--run pg_validatebackup .. working fine
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

--try to start the server
[centos@tushar-ldap-docker bin]$ ./pg_ctl -D db2 start -o '-p 7777'
waiting for server to start....2020-03-05 15:33:53.471 IST [24049] LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv6 address "::1", port 7777
2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv4 address "127.0.0.1", port 7777
2020-03-05 15:33:53.473 IST [24049] LOG:  listening on Unix socket "/tmp/.s.PGSQL.7777"
2020-03-05 15:33:53.476 IST [24050] LOG:  database system was interrupted; last known up at 2020-03-05 15:32:51 IST
2020-03-05 15:33:53.573 IST [24050] LOG:  invalid checkpoint record
2020-03-05 15:33:53.573 IST [24050] FATAL:  could not locate required checkpoint record
2020-03-05 15:33:53.573 IST [24050] HINT:  If you are restoring from a backup, touch "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/recovery.signal" and add required recovery options.
    If you are not restoring from a backup, try removing the file "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label".
    Be careful: removing "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label" will result in a corrupt cluster if restoring from a backup.
2020-03-05 15:33:53.574 IST [24049] LOG:  startup process (PID 24050) exited with exit code 1
2020-03-05 15:33:53.574 IST [24049] LOG:  aborting startup due to startup process failure
2020-03-05 15:33:53.575 IST [24049] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[centos@tushar-ldap-docker bin]$

regards,


On 3/5/20 1:09 PM, Rajkumar Raghuwanshi wrote:
Hi,

In a negative test scenario, if I changed size to -1 in backup_manifest, pg_validatebackup giving
error with a random size number.

[edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold --manifest-checksum 'SHA256'
[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: backup successfully verified

--change a file size to -1 and generate new checksum.
[edb@localhost bin]$ vi /tmp/bold/backup_manifest
[edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a  /tmp/bold/backup_manifest
[edb@localhost bin]$ vi /tmp/bold/backup_manifest

[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: error: "global/4183" has size 0 on disk but size 18446744073709551615 in the manifest

Thanks & Regards,
Rajkumar Raghuwanshi


On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:

On Wed, Mar 4, 2020 at 7:21 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
Hi,

There is a scenario in which i add something inside the pg_tablespace directory , i am getting an error like-

pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/PG_13_202002271/test" is present on disk but not in the manifest

but if i remove 'PG_13_202002271 ' directory then there is no error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verified


This seems expected considering current design as we don't log the directory entries in backup_manifest. In your case, you have tablespace with no objects (empty tablespace) then backup_manifest does not have any entry for this hence when you remove this tablespace directory, validator could not detect it.

We can either document it or add the entry for directories in the manifest. Robert may have a better idea on this.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.


-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: backup manifests

From
tushar
Date:
There is one small observation if we use slash (/) with option -i then not getting the desired result

Steps to reproduce -
==============

[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D test

[centos@tushar-ldap-docker bin]$ touch test/pg_notify/dummy_file

--working
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup   --ignore=pg_notify  test
pg_validatebackup: * manifest_checksum = be9b72e1320c6c34c131533de19371a10dd5011940181724e43277f786026c7b
pg_validatebackup: backup successfully verified

--not working

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup   --ignore=pg_notify/  test
pg_validatebackup: * manifest_checksum = be9b72e1320c6c34c131533de19371a10dd5011940181724e43277f786026c7b
pg_validatebackup: error: "pg_notify/dummy_file" is present on disk but not in the manifest

regards,

On 3/5/20 3:40 PM, tushar wrote:
Hi,

There is one scenario  where  i somehow able to run pg_validatebackup successfully but when i tried to start the server , it is failing

Steps to reproduce -
--create 2 base backup directory
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db1
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db2

--run pg_validatebackup , use backup_manifest of db1 directory against  db2/  . Will get an error
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
pg_validatebackup: error: checksum mismatch for file "backup_label"
 
--copy the backup_level of db1 to db2 folder
[centos@tushar-ldap-docker bin]$ cp db1/backup_label db2/.

--run pg_validatebackup .. working fine
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

--try to start the server
[centos@tushar-ldap-docker bin]$ ./pg_ctl -D db2 start -o '-p 7777'
waiting for server to start....2020-03-05 15:33:53.471 IST [24049] LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv6 address "::1", port 7777
2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv4 address "127.0.0.1", port 7777
2020-03-05 15:33:53.473 IST [24049] LOG:  listening on Unix socket "/tmp/.s.PGSQL.7777"
2020-03-05 15:33:53.476 IST [24050] LOG:  database system was interrupted; last known up at 2020-03-05 15:32:51 IST
2020-03-05 15:33:53.573 IST [24050] LOG:  invalid checkpoint record
2020-03-05 15:33:53.573 IST [24050] FATAL:  could not locate required checkpoint record
2020-03-05 15:33:53.573 IST [24050] HINT:  If you are restoring from a backup, touch "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/recovery.signal" and add required recovery options.
    If you are not restoring from a backup, try removing the file "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label".
    Be careful: removing "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label" will result in a corrupt cluster if restoring from a backup.
2020-03-05 15:33:53.574 IST [24049] LOG:  startup process (PID 24050) exited with exit code 1
2020-03-05 15:33:53.574 IST [24049] LOG:  aborting startup due to startup process failure
2020-03-05 15:33:53.575 IST [24049] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[centos@tushar-ldap-docker bin]$

regards,


On 3/5/20 1:09 PM, Rajkumar Raghuwanshi wrote:
Hi,

In a negative test scenario, if I changed size to -1 in backup_manifest, pg_validatebackup giving
error with a random size number.

[edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold --manifest-checksum 'SHA256'
[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: backup successfully verified

--change a file size to -1 and generate new checksum.
[edb@localhost bin]$ vi /tmp/bold/backup_manifest
[edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a  /tmp/bold/backup_manifest
[edb@localhost bin]$ vi /tmp/bold/backup_manifest

[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: error: "global/4183" has size 0 on disk but size 18446744073709551615 in the manifest

Thanks & Regards,
Rajkumar Raghuwanshi


On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:

On Wed, Mar 4, 2020 at 7:21 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
Hi,

There is a scenario in which i add something inside the pg_tablespace directory , i am getting an error like-

pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/PG_13_202002271/test" is present on disk but not in the manifest

but if i remove 'PG_13_202002271 ' directory then there is no error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verified


This seems expected considering current design as we don't log the directory entries in backup_manifest. In your case, you have tablespace with no objects (empty tablespace) then backup_manifest does not have any entry for this hence when you remove this tablespace directory, validator could not detect it.

We can either document it or add the entry for directories in the manifest. Robert may have a better idea on this.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.


-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company


-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: backup manifests

From
Robert Haas
Date:
On Thu, Mar 5, 2020 at 7:05 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> There is one small observation if we use slash (/) with option -i then not getting the desired result

Here's an updated patch set responding to many of the comments
received thus far. Since there are quite a few emails, let me
consolidate my comments and responses here.

Report: Segmentation fault if -m is used to point to a valid manifest,
but actual backup directory is nonexistent.
Response: Fixed; thanks for the report.

Report: pg_validatebackup doesn't complain about problems within the
pg_wal directory.
Response: That's out of scope. The WAL files are fetched separately
and are therefore not part of the manifest.

Report: Inaccessible file in data directory being validated leads to a
double free.
Response: Fixed; thanks for the report.

Report: Patch 0005 doesn't validate the manifest checksum.
Response: I know. I mentioned that when posting the previous patch
set. Fixed in this version, though.

Report: Removing an empty directory doesn't make backup validation
fail, even though it might cause problems for the server.
Response: That's a little unfortunate, but I'm not sure it's really
worth complicating the patch to deal with it. It's something of a
corner case.

Report: Negative file sizes in the backup manifest are interpreted as
large integers.
Response: That's also a little unfortunate, but I doubt it's worth
adding code to catch it, since any such manifest is corrupt. Also,
it's not like we're ignoring it; the error just isn't ideal.

Report: If I take the backup label from backup #1 and stick it into
otherwise-identical backup #2, validation succeeds but the server
won't start.
Response: That's because we can't validate the pg_wal directory. As
noted above, that's out of scope.

Report: Using --ignore with a slash-terminated pathname doesn't work
as expected.
Response: Fixed, thanks for the report.

Off-List Report: You forgot a PG_BINARY flag.
Response: Fixed. I thought I'd done this before but there were two
places and I'd only fixed one of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Suraj Kharage
Date:
Thanks, Robert.

1: Getting below error while compiling 0002 patch.

edb@localhost:postgres$ mi > mi.log
basebackup.c: In function ‘AddFileToManifest’:
basebackup.c:1052:6: error: ‘pathname’ undeclared (first use in this function)
      pathname);
      ^
basebackup.c:1052:6: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [basebackup.o] Error 1
make[2]: *** [replication-recursive] Error 2
make[1]: *** [install-backend-recurse] Error 2
make: *** [install-src-recurse] Error 2


I can see you have renamed the filename argument of AddFileToManifest() to pathname, but those changes are part of 0003 (validator patch).
I think the changes related to src/backend/replication/basebackup.c should not be there in the validator patch (0003). We can move these changes to backup manifest patch, either in 0002 or 0004 for better readability of patch set.

2:

#define KW_MANIFEST_VERSION "PostgreSQL-Backup-Manifest-Version"
#define KW_MANIFEST_FILE "File"
#define KW_MANIFEST_CHECKSUM "Manifest-Checksum"
#define KWL_MANIFEST_VERSION (sizeof(KW_MANIFEST_VERSION)-1)
#define KWL_MANIFEST_FILE (sizeof(KW_MANIFEST_FILE)-1)
#define KWL_MANIFEST_CHECKSUM (sizeof(KW_MANIFEST_CHECKSUM)-1)

#define FIELDS_PER_FILE_LINE 4

Few macros defined in 0003 patch not used anywhere in 0005 patch. Either we can replace these with hard-coded values or remove them.


On Thu, Mar 5, 2020 at 10:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 5, 2020 at 7:05 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> There is one small observation if we use slash (/) with option -i then not getting the desired result

Here's an updated patch set responding to many of the comments
received thus far. Since there are quite a few emails, let me
consolidate my comments and responses here.

Report: Segmentation fault if -m is used to point to a valid manifest,
but actual backup directory is nonexistent.
Response: Fixed; thanks for the report.

Report: pg_validatebackup doesn't complain about problems within the
pg_wal directory.
Response: That's out of scope. The WAL files are fetched separately
and are therefore not part of the manifest.

Report: Inaccessible file in data directory being validated leads to a
double free.
Response: Fixed; thanks for the report.

Report: Patch 0005 doesn't validate the manifest checksum.
Response: I know. I mentioned that when posting the previous patch
set. Fixed in this version, though.

Report: Removing an empty directory doesn't make backup validation
fail, even though it might cause problems for the server.
Response: That's a little unfortunate, but I'm not sure it's really
worth complicating the patch to deal with it. It's something of a
corner case.

Report: Negative file sizes in the backup manifest are interpreted as
large integers.
Response: That's also a little unfortunate, but I doubt it's worth
adding code to catch it, since any such manifest is corrupt. Also,
it's not like we're ignoring it; the error just isn't ideal.

Report: If I take the backup label from backup #1 and stick it into
otherwise-identical backup #2, validation succeeds but the server
won't start.
Response: That's because we can't validate the pg_wal directory. As
noted above, that's out of scope.

Report: Using --ignore with a slash-terminated pathname doesn't work
as expected.
Response: Fixed, thanks for the report.

Off-List Report: You forgot a PG_BINARY flag.
Response: Fixed. I thought I'd done this before but there were two
places and I'd only fixed one of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: backup manifests

From
tushar
Date:
On 3/5/20 10:25 PM, Robert Haas wrote:
> Here's an updated patch set responding to many of the comments
> received thus far.
Thanks Robert. There is a scenario - if user provide port of v11 server 
at the time of  creating 'base backup'  against pg_basebackup(v13+ your 
patch applied)
with option --manifest-checksums,will lead to  this  below error

[centos@tushar-ldap-docker bin]$ ./pg_basebackup -R -p 9045 
--manifest-checksums=SHA224 -D dc1
pg_basebackup: error: could not initiate base backup: ERROR: syntax error
pg_basebackup: removing data directory "dc1"
[centos@tushar-ldap-docker bin]$

Steps to reproduce -
PG v11 is running
run pg_basebackup against that with option --manifest-checksums

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 9, 2020 at 12:22 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 3/5/20 10:25 PM, Robert Haas wrote:
> > Here's an updated patch set responding to many of the comments
> > received thus far.
> Thanks Robert. There is a scenario - if user provide port of v11 server
> at the time of  creating 'base backup'  against pg_basebackup(v13+ your
> patch applied)
> with option --manifest-checksums,will lead to  this  below error
>
> [centos@tushar-ldap-docker bin]$ ./pg_basebackup -R -p 9045
> --manifest-checksums=SHA224 -D dc1
> pg_basebackup: error: could not initiate base backup: ERROR: syntax error
> pg_basebackup: removing data directory "dc1"
> [centos@tushar-ldap-docker bin]$
>
> Steps to reproduce -
> PG v11 is running
> run pg_basebackup against that with option --manifest-checksums

Seems like expected behavior to me. We could consider providing a more
descriptive error message, but there's now way for it to work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Mar 6, 2020 at 3:58 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> 1: Getting below error while compiling 0002 patch.
> 2:
>
> Few macros defined in 0003 patch not used anywhere in 0005 patch. Either we can replace these with hard-coded values
orremove them.
 

Thanks. I hope that I have straightened those things out in the new
version which is attached. This version also includes some other
changes. The non-JSON code is now completely gone. Also, I've
refactored the code that does parses the JSON manifest to make it
cleaner, and I've moved it out into a separate file. This might be
useful if anyone ends up wanting to reuse that code for some other
purpose, and I think it makes it easier to understand, too, since the
manifest parsing is now much better separated from the task of
actually validating the given directory against the manifest. I've
also added some tests, which are based in part on testing ideas from
Rajkumar Raghuwanshi and Mark Dilger, but this test code was written
by me. So now it's like this:

0001 - checksum helper functions. same as before.
0002 - patch the server to generate and send a manifest, and
pg_basebackup to receive it
0003 - add pg_validatebackup
0004 - TAP tests

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
tushar
Date:
On 3/9/20 10:46 PM, Robert Haas wrote:
> Seems like expected behavior to me. We could consider providing a more
> descriptive error message, but there's now way for it to work.

Right , Error message need to be more user friendly .

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: backup manifests

From
tushar
Date:
On 3/12/20 8:16 PM, tushar wrote:
Seems like expected behavior to me. We could consider providing a more
descriptive error message, but there's now way for it to work.

Right , Error message need to be more user friendly .

One scenario which i feel - should error out  even if  -s option is specified.
create  base  backup directory ( ./pg_basebackup data1)
Connect to root user and take out  the permission from pg_hba.conf file ( chmod 004 pg_hba.conf)

run pg_validatebackup - [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data1
pg_validatebackup: error: could not open file "pg_hba.conf": Permission denied

run pg_validatebackup  with switch -s [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data1 -s
pg_validatebackup: backup successfully verified

here file is not accessible so i think - it should throw you an error ( the same above one) instead of   blindly skipping it. 

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: backup manifests

From
Robert Haas
Date:
On Fri, Mar 13, 2020 at 9:53 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> run pg_validatebackup -
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data1
> pg_validatebackup: error: could not open file "pg_hba.conf": Permission denied
>
> run pg_validatebackup  with switch -s
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data1 -s
> pg_validatebackup: backup successfully verified
>
> here file is not accessible so i think - it should throw you an error ( the same above one) instead of   blindly
skippingit.
 

I don't really want to do that. That would require it to open every
file even if it doesn't need to read the data in the files. I think in
most cases that would just slow it down for no real benefit. If you've
specified -s, you have to be OK with getting a less complete check for
problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Thu, Mar 12, 2020 at 10:47 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 3/9/20 10:46 PM, Robert Haas wrote:
> > Seems like expected behavior to me. We could consider providing a more
> > descriptive error message, but there's now way for it to work.
>
> Right , Error message need to be more user friendly .

OK. Done in the attached version, which also includes a few other changes:

- I expanded the regression tests. They now cover every line of code
in parse_manifest.c except for a few that I believe to be unreachable
(though I might be mistaken). Coverage for pg_validatebackup.c is also
improved, but it's not 100%; there are some cases that I don't know
how to hit outside of a kernel malfunction, and others that I only
know how to hit on non-Windows systems. For instance, it's easy to use
perl to make a file inaccessible on Linux with chmod(0, $filename),
but I gather that doesn't work on Windows. I'm going to spend a bit
more time looking at this, but I think it's already reasonably good.

- I fixed a couple of very minor bugs which I discovered by writing those tests.

- I added documentation, in part based on a draft Mark Dilger shared
with me off-list.

I don't think this is committable just yet, but I think it's getting
fairly close, so if anyone has major objections please speak up soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Suraj Kharage
Date:
Thank you, Robert.

Getting below warning while compiling the v11-0003-pg_validatebackup-Validate-a-backup-against-the-.patch.

pg_validatebackup.c: In function ‘report_manifest_error’:
pg_validatebackup.c:356:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
  pg_log_generic_v(PG_LOG_FATAL, fmt, ap);

 

To resolve this, can we use "pg_attribute_printf(2, 3)" in function declaration something like below?
e.g:

diff --git a/src/bin/pg_validatebackup/parse_manifest.h b/src/bin/pg_validatebackup/parse_manifest.h
index b0b18a5..25d140f 100644
--- a/src/bin/pg_validatebackup/parse_manifest.h
+++ b/src/bin/pg_validatebackup/parse_manifest.h
@@ -25,7 +25,7 @@ typedef void (*json_manifest_perfile_callback)(JsonManifestParseContext *,
                                                                 size_t size, pg_checksum_type checksum_type,
                                                                 int checksum_length, uint8 *checksum_payload);
 typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
-                                                                char *fmt, ...);
+                                                                char *fmt,...) pg_attribute_printf(2, 3);
 
 struct JsonManifestParseContext
 {
diff --git a/src/bin/pg_validatebackup/pg_validatebackup.c b/src/bin/pg_validatebackup/pg_validatebackup.c
index 0e7299b..6ccbe59 100644
--- a/src/bin/pg_validatebackup/pg_validatebackup.c
+++ b/src/bin/pg_validatebackup/pg_validatebackup.c
@@ -95,7 +95,7 @@ static void record_manifest_details_for_file(JsonManifestParseContext *context,
                                                                                         int checksum_length,
                                                                                         uint8 *checksum_payload);
 static void report_manifest_error(JsonManifestParseContext *context,
-                                                                 char *fmt, ...);
+                                                                 char *fmt,...) pg_attribute_printf(2, 3);
 
 static void validate_backup_directory(validator_context *context,
                                                                          char *relpath, char *fullpath);


Typos:

0004 patch
unexpctedly => unexpectedly

0005 patch
bacup => backup

On Sat, Mar 14, 2020 at 2:04 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 12, 2020 at 10:47 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 3/9/20 10:46 PM, Robert Haas wrote:
> > Seems like expected behavior to me. We could consider providing a more
> > descriptive error message, but there's now way for it to work.
>
> Right , Error message need to be more user friendly .

OK. Done in the attached version, which also includes a few other changes:

- I expanded the regression tests. They now cover every line of code
in parse_manifest.c except for a few that I believe to be unreachable
(though I might be mistaken). Coverage for pg_validatebackup.c is also
improved, but it's not 100%; there are some cases that I don't know
how to hit outside of a kernel malfunction, and others that I only
know how to hit on non-Windows systems. For instance, it's easy to use
perl to make a file inaccessible on Linux with chmod(0, $filename),
but I gather that doesn't work on Windows. I'm going to spend a bit
more time looking at this, but I think it's already reasonably good.

- I fixed a couple of very minor bugs which I discovered by writing those tests.

- I added documentation, in part based on a draft Mark Dilger shared
with me off-list.

I don't think this is committable just yet, but I think it's getting
fairly close, so if anyone has major objections please speak up soon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: backup manifests

From
Suraj Kharage
Date:
One more suggestion, recent commit (1933ae62) has added the PostgreSQL home page to --help output.

e.g:
PostgreSQL home page: <https://www.postgresql.org/>

We might need to consider this change for pg_validatebackup binary.

On Mon, Mar 16, 2020 at 10:37 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
Thank you, Robert.

Getting below warning while compiling the v11-0003-pg_validatebackup-Validate-a-backup-against-the-.patch.

pg_validatebackup.c: In function ‘report_manifest_error’:
pg_validatebackup.c:356:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
  pg_log_generic_v(PG_LOG_FATAL, fmt, ap);

 

To resolve this, can we use "pg_attribute_printf(2, 3)" in function declaration something like below?
e.g:

diff --git a/src/bin/pg_validatebackup/parse_manifest.h b/src/bin/pg_validatebackup/parse_manifest.h
index b0b18a5..25d140f 100644
--- a/src/bin/pg_validatebackup/parse_manifest.h
+++ b/src/bin/pg_validatebackup/parse_manifest.h
@@ -25,7 +25,7 @@ typedef void (*json_manifest_perfile_callback)(JsonManifestParseContext *,
                                                                 size_t size, pg_checksum_type checksum_type,
                                                                 int checksum_length, uint8 *checksum_payload);
 typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
-                                                                char *fmt, ...);
+                                                                char *fmt,...) pg_attribute_printf(2, 3);
 
 struct JsonManifestParseContext
 {
diff --git a/src/bin/pg_validatebackup/pg_validatebackup.c b/src/bin/pg_validatebackup/pg_validatebackup.c
index 0e7299b..6ccbe59 100644
--- a/src/bin/pg_validatebackup/pg_validatebackup.c
+++ b/src/bin/pg_validatebackup/pg_validatebackup.c
@@ -95,7 +95,7 @@ static void record_manifest_details_for_file(JsonManifestParseContext *context,
                                                                                         int checksum_length,
                                                                                         uint8 *checksum_payload);
 static void report_manifest_error(JsonManifestParseContext *context,
-                                                                 char *fmt, ...);
+                                                                 char *fmt,...) pg_attribute_printf(2, 3);
 
 static void validate_backup_directory(validator_context *context,
                                                                          char *relpath, char *fullpath);


Typos:

0004 patch
unexpctedly => unexpectedly

0005 patch
bacup => backup

On Sat, Mar 14, 2020 at 2:04 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 12, 2020 at 10:47 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 3/9/20 10:46 PM, Robert Haas wrote:
> > Seems like expected behavior to me. We could consider providing a more
> > descriptive error message, but there's now way for it to work.
>
> Right , Error message need to be more user friendly .

OK. Done in the attached version, which also includes a few other changes:

- I expanded the regression tests. They now cover every line of code
in parse_manifest.c except for a few that I believe to be unreachable
(though I might be mistaken). Coverage for pg_validatebackup.c is also
improved, but it's not 100%; there are some cases that I don't know
how to hit outside of a kernel malfunction, and others that I only
know how to hit on non-Windows systems. For instance, it's easy to use
perl to make a file inaccessible on Linux with chmod(0, $filename),
but I gather that doesn't work on Windows. I'm going to spend a bit
more time looking at this, but I think it's already reasonably good.

- I fixed a couple of very minor bugs which I discovered by writing those tests.

- I added documentation, in part based on a draft Mark Dilger shared
with me off-list.

I don't think this is committable just yet, but I think it's getting
fairly close, so if anyone has major objections please speak up soon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: backup manifests

From
tushar
Date:
On 3/14/20 2:04 AM, Robert Haas wrote:
> OK. Done in the attached version

Thanks. Verified.

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 16, 2020 at 2:03 AM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> One more suggestion, recent commit (1933ae62) has added the PostgreSQL home page to --help output.

Good catch. Fixed. I also attempted to address the compiler warning
you mentioned in your other email.

Also, I realized that the previous patch versions didn't handle the
hex-encoded path format that we need to use for non-UTF8 filenames,
and that there was no easy way to test that format. So, in this
version I added an option to force all pathnames to be encoded in that
format. I also made that option capable of suppressing the backup
manifest altogether. Other than that, this version is pretty much the
same as the last version, except for a few additional test cases which
I added to get the code coverage up even a little more. It would be
nice if someone could test whether the tests pass on Windows.

I have squashed the series down to just 2 commits, since that seems
like the way that this should probably be committed. Barring strong
objections and/or the end of the world, I plan to do that next week.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Amit Kapila
Date:
On Sat, Mar 21, 2020 at 4:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Mar 16, 2020 at 2:03 AM Suraj Kharage
> <suraj.kharage@enterprisedb.com> wrote:
> > One more suggestion, recent commit (1933ae62) has added the PostgreSQL home page to --help output.
>
> Good catch. Fixed. I also attempted to address the compiler warning
> you mentioned in your other email.
>
> Also, I realized that the previous patch versions didn't handle the
> hex-encoded path format that we need to use for non-UTF8 filenames,
> and that there was no easy way to test that format. So, in this
> version I added an option to force all pathnames to be encoded in that
> format. I also made that option capable of suppressing the backup
> manifest altogether. Other than that, this version is pretty much the
> same as the last version, except for a few additional test cases which
> I added to get the code coverage up even a little more. It would be
> nice if someone could test whether the tests pass on Windows.
>

On my CentOS, the patch gives below compilation failure:
pg_validatebackup.c: In function ‘parse_manifest_file’:
pg_validatebackup.c:335:19: error: assignment left-hand side might be
a candidate for a format attribute [-Werror=suggest-attribute=format]
  context.error_cb = report_manifest_error;

I have tested it on Windows and found there are multiple failures.
The failures are as below:
Test Summary Report
---------------------------------------
t/002_algorithm.pl   (Wstat: 512 Tests: 5 Failed: 4)
  Failed tests:  2-5
  Non-zero exit status: 2
  Parse errors: Bad plan.  You planned 19 tests but ran 5.
t/003_corruption.pl  (Wstat: 256 Tests: 14 Failed: 7)
  Failed tests:  2, 4, 6, 8, 10, 12, 14
  Non-zero exit status: 1
  Parse errors: Bad plan.  You planned 44 tests but ran 14.
t/004_options.pl     (Wstat: 4352 Tests: 25 Failed: 17)
  Failed tests:  2, 4, 6-12, 14-17, 19-20, 22, 25
  Non-zero exit status: 17
t/005_bad_manifest.pl (Wstat: 1792 Tests: 44 Failed: 7)
  Failed tests:  18, 24, 26, 30, 32, 34, 36
  Non-zero exit status: 7
Files=6, Tests=109, 72 wallclock secs ( 0.05 usr +  0.01 sys =  0.06 CPU)
Result: FAIL

Failure Report
------------------------
t/002_algorithm.pl ..... 1/19
#   Failed test 'backup ok with algorithm "none"'
#   at t/002_algorithm.pl line 33.

#   Failed test 'backup manifest exists'
#   at t/002_algorithm.pl line 39.

t/002_algorithm.pl ..... 4/19 #   Failed test 'validate backup with
algorithm "none"'
#   at t/002_algorithm.pl line 53.

#   Failed test 'backup ok with algorithm "crc32c"'
#   at t/002_algorithm.pl line 33.
# Looks like you planned 19 tests but ran 5.
# Looks like you failed 4 tests of 5 run.
# Looks like your test exited with 2 just after 5.
t/002_algorithm.pl ..... Dubious, test returned 2 (wstat 512, 0x200)
Failed 18/19 subtests
t/003_corruption.pl .... 1/44
#   Failed test 'intact backup validated'
#   at t/003_corruption.pl line 110.

#   Failed test 'corrupt backup fails validation: extra_file: matches'
#   at t/003_corruption.pl line 117.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:extra_file.*present on disk but not in the manifest)'
t/003_corruption.pl .... 5/44
#   Failed test 'intact backup validated'
#   at t/003_corruption.pl line 110.
t/003_corruption.pl .... 7/44
#   Failed test 'corrupt backup fails validation:
extra_tablespace_file: matches'
#   at t/003_corruption.pl line 117.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:extra_ts_file.*present on disk but not in the
manifest)'
t/003_corruption.pl .... 9/44
#   Failed test 'intact backup validated'
#   at t/003_corruption.pl line 110.

#   Failed test 'corrupt backup fails validation: missing_file: matches'
#   at t/003_corruption.pl line 117.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:pg_xact/0000.*present in the manifest but not on disk)'
t/003_corruption.pl .... 13/44
#   Failed test 'intact backup validated'
#   at t/003_corruption.pl line 110.
# Looks like you planned 44 tests but ran 14.
# Looks like you failed 7 tests of 14 run.
# Looks like your test exited with 1 just after 14.
t/003_corruption.pl .... Dubious, test returned 1 (wstat 256, 0x100)
Failed 37/44 subtests
t/004_options.pl ....... 1/25
#   Failed test '-q succeeds: exit code 0'
#   at t/004_options.pl line 25.

#   Failed test '-q succeeds: no stderr'
#   at t/004_options.pl line 27.
#          got: 'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     expected: ''

#   Failed test '-q checksum mismatch: matches'
#   at t/004_options.pl line 37.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:checksum mismatch for file \"PG_VERSION\")'
t/004_options.pl ....... 7/25
#   Failed test '-s skips checksumming: exit code 0'
#   at t/004_options.pl line 43.

#   Failed test '-s skips checksumming: no stderr'
#   at t/004_options.pl line 43.
#          got: 'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     expected: ''

#   Failed test '-s skips checksumming: matches'
#   at t/004_options.pl line 43.
#                   ''
#     doesn't match '(?^:backup successfully verified)'

#   Failed test '-i ignores problem file: exit code 0'
#   at t/004_options.pl line 48.

#   Failed test '-i ignores problem file: no stderr'
#   at t/004_options.pl line 48.
#          got: 'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     expected: ''

#   Failed test '-i ignores problem file: matches'
#   at t/004_options.pl line 48.
#                   ''
#     doesn't match '(?^:backup successfully verified)'

#   Failed test '-i does not ignore all problems: matches'
#   at t/004_options.pl line 57.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:pg_xact.*is present in the manifest but not on disk)'

#   Failed test 'multiple -i options work: exit code 0'
#   at t/004_options.pl line 62.

#   Failed test 'multiple -i options work: no stderr'
#   at t/004_options.pl line 62.
#          got: 'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     expected: ''

#   Failed test 'multiple -i options work: matches'
#   at t/004_options.pl line 62.
#                   ''
#     doesn't match '(?^:backup successfully verified)'

#   Failed test 'multiple problems: missing files reported'
#   at t/004_options.pl line 71.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:pg_xact.*is present in the manifest but not on disk)'

#   Failed test 'multiple problems: checksum mismatch reported'
#   at t/004_options.pl line 73.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:checksum mismatch for file \"PG_VERSION\")'

#   Failed test '-e reports 1 error: missing files reported'
#   at t/004_options.pl line 80.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:pg_xact.*is present in the manifest but not on disk)'

#   Failed test 'nonexistent backup directory: matches'
#   at t/004_options.pl line 86.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:could not open directory)'
# Looks like you failed 17 tests of 25.
t/004_options.pl ....... Dubious, test returned 17 (wstat 4352, 0x1100)
Failed 17/25 subtests
t/005_bad_manifest.pl .. 1/44
#   Failed test 'missing pathname: matches'
#   at t/005_bad_manifest.pl line 156.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: missing size
# '
#     doesn't match '(?^:could not parse backup manifest: missing pathname)'

#   Failed test 'missing size: matches'
#   at t/005_bad_manifest.pl line 156.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:could not parse backup manifest: missing size)'

#   Failed test 'file size is not an integer: matches'
#   at t/005_bad_manifest.pl line 156.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:could not parse backup manifest: file size is
not an integer)'

#   Failed test 'duplicate pathname in backup manifest: matches'
#   at t/005_bad_manifest.pl line 156.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:fatal: duplicate pathname in backup manifest)'
t/005_bad_manifest.pl .. 31/44
#   Failed test 'checksum without algorithm: matches'
#   at t/005_bad_manifest.pl line 156.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:could not parse backup manifest: checksum
without algorithm)'

#   Failed test 'unrecognized checksum algorithm: matches'
#   at t/005_bad_manifest.pl line 156.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:fatal: unrecognized checksum algorithm)'

#   Failed test 'invalid checksum for file: matches'
#   at t/005_bad_manifest.pl line 156.
#                   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
#     doesn't match '(?^:fatal: invalid checksum for file)'
# Looks like you failed 7 tests of 44.
t/005_bad_manifest.pl .. Dubious, test returned 7 (wstat 1792, 0x700)
Failed 7/44 subtests

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: backup manifests

From
Amit Kapila
Date:
On Sat, Mar 21, 2020 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> On my CentOS, the patch gives below compilation failure:
> pg_validatebackup.c: In function ‘parse_manifest_file’:
> pg_validatebackup.c:335:19: error: assignment left-hand side might be
> a candidate for a format attribute [-Werror=suggest-attribute=format]
>   context.error_cb = report_manifest_error;
>
> I have tested it on Windows and found there are multiple failures.
> The failures are as below:
>

I have started to investigate the failures.

>
> Failure Report
> ------------------------
> t/002_algorithm.pl ..... 1/19
> #   Failed test 'backup ok with algorithm "none"'
> #   at t/002_algorithm.pl line 33.
>

I checked the log and it was giving error:

/src/bin/pg_validatebackup/tmp_check/t_002_algorithm_master_data/backup/none
--manifest-checksum none --no-sync
\tmp_install\bin\pg_basebackup.EXE: illegal option -- manifest-checksum

It seems the option to be used should be --manifest-checksums.  The
attached patch fixes this problem for me.

> t/002_algorithm.pl ..... 4/19 #   Failed test 'validate backup with
> algorithm "none"'
> #   at t/002_algorithm.pl line 53.
>

The error message for the above failure is:
pg_validatebackup: fatal: could not parse backup manifest: both
pathname and encoded pathname

I don't know at this stage what could cause this?  Any pointers?

Attached are logs of failed runs (regression.tar.gz).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 23, 2020 at 7:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> /src/bin/pg_validatebackup/tmp_check/t_002_algorithm_master_data/backup/none
> --manifest-checksum none --no-sync
> \tmp_install\bin\pg_basebackup.EXE: illegal option -- manifest-checksum
>
> It seems the option to be used should be --manifest-checksums.  The
> attached patch fixes this problem for me.

OK, incorporated that.

> > t/002_algorithm.pl ..... 4/19 #   Failed test 'validate backup with
> > algorithm "none"'
> > #   at t/002_algorithm.pl line 53.
> >
>
> The error message for the above failure is:
> pg_validatebackup: fatal: could not parse backup manifest: both
> pathname and encoded pathname
>
> I don't know at this stage what could cause this?  Any pointers?

I think I forgot an initializer. Try this version.

I also incorporated a fix previously proposed by Suraj for the
compiler warning you mentioned in the other email.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> I think I forgot an initializer. Try this version.

Just took a quick look through this.  I'm pretty sure David wants to
look at it too.  Anyway, some comments below.

> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
> index f139ba0231..d1ff53e8e8 100644
> --- a/doc/src/sgml/protocol.sgml
> +++ b/doc/src/sgml/protocol.sgml
> @@ -2466,7 +2466,7 @@ The commands accepted in replication mode are:
>    </varlistentry>
>
>    <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
> -    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [
<literal>PROGRESS</literal>] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [
<literal>MAX_RATE</literal><replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [
<literal>NOVERIFY_CHECKSUMS</literal>] 
> +    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [
<literal>PROGRESS</literal>] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [
<literal>MAX_RATE</literal><replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [
<literal>NOVERIFY_CHECKSUMS</literal>] [ <literal>MANIFEST</literal> <replaceable>manifest_option</replaceable> ] [
<literal>MANIFEST_CHECKSUMS</literal><replaceable>checksum_algorithm</replaceable> ] 
>       <indexterm><primary>BASE_BACKUP</primary></indexterm>
>      </term>
>      <listitem>
> @@ -2576,6 +2576,37 @@ The commands accepted in replication mode are:
>           </para>
>          </listitem>
>         </varlistentry>
> +
> +       <varlistentry>
> +        <term><literal>MANIFEST</literal></term>
> +        <listitem>
> +         <para>
> +          When this option is specified with a value of <literal>ye'</literal>
> +          or <literal>force-escape</literal>, a backup manifest is created
> +          and sent along with the backup. The latter value forces all filenames
> +          to be hex-encoded; otherwise, this type of encoding is performed only
> +          for files whose names are non-UTF8 octet sequences.
> +          <literal>force-escape</literal> is intended primarily for testing
> +          purposes, to be sure that clients which read the backup manifest
> +          can handle this case. For compatibility with previous releases,
> +          the default is <literal>MANIFEST 'no'</literal>.
> +         </para>
> +        </listitem>
> +       </varlistentry>
> +
> +       <varlistentry>
> +        <term><literal>MANIFEST_CHECKSUMS</literal></term>
> +        <listitem>
> +         <para>
> +          Specifies the algorithm that should be used to checksum each file
> +          for purposes of the backup manifest. Currently, the available
> +          algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
> +          <literal>SHA224</literal>, <literal>SHA256</literal>,
> +          <literal>SHA384</literal>, and <literal>SHA512</literal>.
> +          The default is <literal>CRC32C</literal>.
> +         </para>
> +        </listitem>
> +       </varlistentry>
>        </variablelist>
>       </para>
>       <para>

While I get the desire to have a default here that includes checksums,
the way the command is structured, it strikes me as odd that the lack of
MANIFEST_CHECKSUMS in the command actually results in checksums being
included.  I would think that we'd either:

- have the lack of MANIFEST_CHECKSUMS mean 'No checksums'

or

- Require MANIFEST_CHECKSUMS to be specified and not have it be optional

We aren't expecting people to actually be typing these commands out and
so I don't think it's a *huge* deal to have it the way you've written
it, but it still strikes me as odd.  I don't think I have a real
preference between the two options that I suggest above, maybe very
slightly in favor of the first.

> diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
> index 90638aad0e..bf6963a595 100644
> --- a/doc/src/sgml/ref/pg_basebackup.sgml
> +++ b/doc/src/sgml/ref/pg_basebackup.sgml
> @@ -561,6 +561,69 @@ PostgreSQL documentation
>         </para>
>        </listitem>
>       </varlistentry>
> +
> +     <varlistentry>
> +      <term><option>--no-manifest</option></term>
> +      <listitem>
> +       <para>
> +        Disables generation of a backup manifest. If this option is not
> +        specified, the server will and send generate a backup manifest
> +        which can be verified using <xref linkend="app-pgvalidatebackup" />.
> +       </para>
> +      </listitem>
> +     </varlistentry>

How about "If this option is not specified, the server will generate and
send a backup manifest which can be verified using ..."

> +     <varlistentry>
> +      <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
> +      <listitem>
> +       <para>
> +        Specifies the algorithm that should be used to checksum each file
> +        for purposes of the backup manifest. Currently, the available
> +        algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
> +        <literal>SHA224</literal>, <literal>SHA256</literal>,
> +        <literal>SHA384</literal>, and <literal>SHA512</literal>.
> +        The default is <literal>CRC32C</literal>.
> +       </para>

As I recall, there was an invitation to argue about the defaults at one
point, and so I'm going to say here that I would advocate for a
different default than 'crc32c'.  Specifically, I would think sha256 or
512 would be better.  I don't recall seeing a debate about this that
conclusively found crc32c to be better, but I'm happy to go back and
reread anything someone wants to point me at.

> +       <para>
> +        If <literal>NONE</literal> is selected, the backup manifest will
> +        not contain any checksums. Otherwise, it will contain a checksum
> +        of each file in the backup using the specified algorithm. In addition,
> +        the manifest itself will always contain a <literal>SHA256</literal>
> +        checksum of its own contents. The <literal>SHA</literal> algorithms
> +        are significantly more CPU-intensive than <literal>CRC32C</literal>,
> +        so selecting one of them may increase the time required to complete
> +        the backup.
> +       </para>

It also seems a bit silly to me that using the defaults means having to
deal with two different algorithms- crc32c and sha256.  Considering how
fast these algorithms are, compared to everything else involved in a
backup (particularly one that's likely going across a network...), I
wonder if we should say "may slightly increase" above.

> +       <para>
> +        On the other hand, <literal>CRC32C</literal> is not a cryptographic
> +        hash function, so it is only suitable for protecting against
> +        inadvertent or random modifications to a backup. An adversary
> +        who can modify the backup could easily do so in such a way that
> +        the CRC does not change, whereas a SHA collision will be hard
> +        to manufacture. (However, note that if the attacker also has access
> +        to modify the backup manifest itself, no checksum algorithm will
> +        provide any protection.) An additional advantage of the
> +        <literal>SHA</literal> family of functions is that they output
> +        a much larger number of bits.
> +       </para>

I'm not really sure that this paragraph is sensible to include..  We
certainly don't talk about adversaries and cryptographic hash functions
when we talk about our page-level checksums, for example.  I'm not
completely against including it, but I don't want to give the impression
that this is something we routinely consider or that lack of discussion
elsewhere implies we have protections against a determined attacker.

> diff --git a/doc/src/sgml/ref/pg_validatebackup.sgml b/doc/src/sgml/ref/pg_validatebackup.sgml
> new file mode 100644
> index 0000000000..1c171f6970
> --- /dev/null
> +++ b/doc/src/sgml/ref/pg_validatebackup.sgml
> @@ -0,0 +1,232 @@
> +<!--
> +doc/src/sgml/ref/pg_validatebackup.sgml
> +PostgreSQL documentation
> +-->
> +
> +<refentry id="app-pgvalidatebackup">
> + <indexterm zone="app-pgvalidatebackup">
> +  <primary>pg_validatebackup</primary>
> + </indexterm>
> +
> + <refmeta>
> +  <refentrytitle>pg_validatebackup</refentrytitle>
> +  <manvolnum>1</manvolnum>
> +  <refmiscinfo>Application</refmiscinfo>
> + </refmeta>
> +
> + <refnamediv>
> +  <refname>pg_validatebackup</refname>
> +  <refpurpose>verify the integrity of a base backup of a
> +  <productname>PostgreSQL</productname> cluster</refpurpose>
> + </refnamediv>

"verify the integrity of a backup taken using pg_basebackup"

> + <refsect1>
> +  <title>
> +   Description
> +  </title>
> +  <para>
> +   <application>pg_validatebackup</application> is used to check the integrity
> +   of a database cluster backup.  The backup being checked should have been
> +   created by <command>pg_basebackup</command> or some other tool that includes
> +   a <literal>backup_manifest</literal> file with the backup. The backup
> +   must be stored in the "plain" format; a "tar" format backup can be checked
> +   after extracting it. Backup manifests are created by the server beginning
> +   with <productname>PostgreSQL</productname> version 13, so older backups
> +   cannot be validated using this tool.
> +  </para>

This seems to invite the idea that pg_validatebackup should be able to
work with external backup solutions- but I'm a bit concerned by that
idea because it seems like it would then mean we'd have to be
particularly careful when changing things in this area, and I'm not
thrilled by that.  I'd like to make sure that new versions of
pg_validatebackup work with older backups, and, ideally, older versions
of pg_validatebackup would work even with newer backups, all of which I
think the json structure of the manifest helps us with, but that's when
we're building the manifest and know what it's going to look like.

Maybe to put it another way- would a patch be accepted to make
pg_validatebackup work with other manifests..?  If not, then I'd keep
this to the more specific "this tool is used to validate backups taken
using pg_basebackup".

> +  <para>
> +   <application>pg_validatebackup</application> reads the manifest file of a
> +   backup, verifies the manifest against its own internal checksum, and then
> +   verifies that the same files are present in the target directory as in the
> +   manifest itself. It then verifies that each file has the expected checksum,
> +   unless the backup was taken the checksum algorithm set to

"was taken with the checksum algorithm"...

> +   <literal>none</literal>, in which case checksum verification is not
> +   performed. The presence or absence of directories is not checked, except
> +   indirectly: if a directory is missing, any files it should have contained
> +   will necessarily also be missing. Certain files and directories are
> +   excluded from verification:
> +  </para>
> +
> +  <itemizedlist>
> +    <listitem>
> +      <para>
> +        <literal>backup_manifest</literal> is ignored because the backup
> +        manifest is logically not part of the backup and does not include
> +        any entry for itself.
> +      </para>
> +    </listitem>

This seems a bit confusing, doesn't it?  The backup_manifest must exist,
and its checksum is internal, and is checked, isn't it?  Why say that
it's excluded..?

> +    <listitem>
> +      <para>
> +        <literal>pg_wal</literal> is ignored because WAL files are sent
> +        separately from the backup, and are therefore not described by the
> +        backup manifest.
> +      </para>
> +    </listitem>

I don't agree with the choice to exclude the WAL files, considering
they're an integral part of a backup, to exclude them means that if
they've been corrupted at all then the entire backup is invalid.  You
don't want to be discovering that when you're trying to do a restore of
a backup that you took with pg_basebackup and which pg_validatebackup
says is valid.  After all, the tool being used here, pg_basebackup,
*does* also stream the WAL files- there's no reason why we can't
calculate a checksum on them and store that checksum somewhere and use
it to validate the WAL files.  This, in my opinion, is actually a
show-stopper for this feature.  Claiming it's a valid backup when we
don't check the absolutely necessary-for-restore WAL is making a false
claim, no matter how well it's documented.

I do understand that it's possible to run pg_basebackup without the WAL
files being grabbed as part of that run- in such a case, we should be
able to detect that was the case for the backup and when running
pg_validatebackup we should issue a WARNING that the WAL files weren't
able to be verified (we could have an option to suppress that warning if
people feel that's needed).

> +    <listitem>
> +      <para>
> +        <literal>postgesql.auto.conf</literal>,
> +        <literal>standby.signal</literal>,
> +        and <literal>recovery.signal</literal> are ignored because they may
> +        sometimes be created or modified by the backup client itself.
> +        (For example, <literal>pg_basebackup -R</literal> will modify
> +        <literal>postgresql.auto.conf</literal> and create
> +        <literal>standby.signal</literal>.)
> +      </para>
> +    </listitem>
> +  </itemizedlist>
> + </refsect1>

Not really thrilled with this (pg_basebackup certainly could figure out
the checksum for those files...), but I also don't think it's a huge
issue as they can be recreated by a user (unlike a WAL file..).

I got through most of the pg_basebackup changes, and they looked pretty
good in general.  Will try to review more tomorrow.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Amit Kapila
Date:
On Mon, Mar 23, 2020 at 9:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Mar 23, 2020 at 7:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > /src/bin/pg_validatebackup/tmp_check/t_002_algorithm_master_data/backup/none
> > --manifest-checksum none --no-sync
> > \tmp_install\bin\pg_basebackup.EXE: illegal option -- manifest-checksum
> >
> > It seems the option to be used should be --manifest-checksums.  The
> > attached patch fixes this problem for me.
>
> OK, incorporated that.
>
> > > t/002_algorithm.pl ..... 4/19 #   Failed test 'validate backup with
> > > algorithm "none"'
> > > #   at t/002_algorithm.pl line 53.
> > >
> >
> > The error message for the above failure is:
> > pg_validatebackup: fatal: could not parse backup manifest: both
> > pathname and encoded pathname
> >
> > I don't know at this stage what could cause this?  Any pointers?
>
> I think I forgot an initializer. Try this version.
>

All others except one are passing now.  See the summary of the failed
test below and attached are failed run logs.

Test Summary Report
-------------------
t/003_corruption.pl  (Wstat: 65280 Tests: 14 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 44 tests but ran 14.
Files=6, Tests=123, 164 wallclock secs ( 0.06 usr +  0.02 sys =  0.08 CPU)
Result: FAIL

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 23, 2020 at 11:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> All others except one are passing now.  See the summary of the failed
> test below and attached are failed run logs.
>
> Test Summary Report
> -------------------
> t/003_corruption.pl  (Wstat: 65280 Tests: 14 Failed: 0)
>   Non-zero exit status: 255
>   Parse errors: Bad plan.  You planned 44 tests but ran 14.
> Files=6, Tests=123, 164 wallclock secs ( 0.06 usr +  0.02 sys =  0.08 CPU)
> Result: FAIL

Hmm. It looks like it's trying to remove the symlink that points to
the tablespace directory, and failing with no error message. I could
set that permutation to be skipped on Windows, or maybe there's an
alternate method you can suggest that would work?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 23, 2020 at 6:42 PM Stephen Frost <sfrost@snowman.net> wrote:
> While I get the desire to have a default here that includes checksums,
> the way the command is structured, it strikes me as odd that the lack of
> MANIFEST_CHECKSUMS in the command actually results in checksums being
> included.

I don't think that's quite accurate, because the default for the
MANIFEST option is 'no', so the actual default if you say nothing
about manifests at all, you don't get one. However, it is true that if
you ask for a manifest and you don't specify the type of checksums,
you get CRC-32C. We could change it so that if you ask for a manifest
you must also specify the type of checksum, but I don't see any
advantage in that approach. Nothing prevents the client from
specifying the value if it cares, but making the default "I don't
care, you pick" seems pretty sensible. It could be really helpful if,
for example, we decide to remove the initial default in a future
release for some reason. Then the client just keeps working without
needing to change anything, but anyone who explicitly specified the
old default gets an error.

> > +        Disables generation of a backup manifest. If this option is not
> > +        specified, the server will and send generate a backup manifest
> > +        which can be verified using <xref linkend="app-pgvalidatebackup" />.
> > +       </para>
> > +      </listitem>
> > +     </varlistentry>
>
> How about "If this option is not specified, the server will generate and
> send a backup manifest which can be verified using ..."

Good suggestion. :-)

> > +     <varlistentry>
> > +      <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
> > +      <listitem>
> > +       <para>
> > +        Specifies the algorithm that should be used to checksum each file
> > +        for purposes of the backup manifest. Currently, the available
> > +        algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
> > +        <literal>SHA224</literal>, <literal>SHA256</literal>,
> > +        <literal>SHA384</literal>, and <literal>SHA512</literal>.
> > +        The default is <literal>CRC32C</literal>.
> > +       </para>
>
> As I recall, there was an invitation to argue about the defaults at one
> point, and so I'm going to say here that I would advocate for a
> different default than 'crc32c'.  Specifically, I would think sha256 or
> 512 would be better.  I don't recall seeing a debate about this that
> conclusively found crc32c to be better, but I'm happy to go back and
> reread anything someone wants to point me at.

It was discussed upthread. Andrew Dunstan argued that there was no
reason to use a cryptographic checksum here and that we shouldn't do
so gratuitously. Suraj Kharage found that CRC-32C has very little
performance impact but that any of the SHA functions slow down backups
considerably. David Steele pointed out that you'd need a better
checksum if you wanted to use it for purposes such as delta restore,
with which I agree, but that's not the design center for this feature.
I concluded that different people wanted different things, so that we
ought to make this configurable, but that CRC-32C is a good default.
It has approximately a 99.9999999767169% chance of detecting a random
error, which is pretty good, and it doesn't drastically slow down
backups, which is also good.

> It also seems a bit silly to me that using the defaults means having to
> deal with two different algorithms- crc32c and sha256.  Considering how
> fast these algorithms are, compared to everything else involved in a
> backup (particularly one that's likely going across a network...), I
> wonder if we should say "may slightly increase" above.

Actually, Suraj's results upthread show that it's a pretty big hit.

> > +       <para>
> > +        On the other hand, <literal>CRC32C</literal> is not a cryptographic
> > +        hash function, so it is only suitable for protecting against
> > +        inadvertent or random modifications to a backup. An adversary
> > +        who can modify the backup could easily do so in such a way that
> > +        the CRC does not change, whereas a SHA collision will be hard
> > +        to manufacture. (However, note that if the attacker also has access
> > +        to modify the backup manifest itself, no checksum algorithm will
> > +        provide any protection.) An additional advantage of the
> > +        <literal>SHA</literal> family of functions is that they output
> > +        a much larger number of bits.
> > +       </para>
>
> I'm not really sure that this paragraph is sensible to include..  We
> certainly don't talk about adversaries and cryptographic hash functions
> when we talk about our page-level checksums, for example.  I'm not
> completely against including it, but I don't want to give the impression
> that this is something we routinely consider or that lack of discussion
> elsewhere implies we have protections against a determined attacker.

Given the skepticism from some quarters about CRC-32C on this thread,
I didn't want to oversell it. Also, I do think that these things are
possibly things that we should consider more widely. I agree with
Andrew's complaint that it's far too easy to just throw SHA<lots> at
problems that don't really require it without any actually good
reason. Spelling out our reasons for choosing certain algorithms for
certain purposes seems like a good habit to get into, and if we
haven't done it in other places, maybe we should. On the other hand,
while I'm inclined to keep this paragraph, I won't lose much sleep if
we decide to remove it.

> > + <refnamediv>
> > +  <refname>pg_validatebackup</refname>
> > +  <refpurpose>verify the integrity of a base backup of a
> > +  <productname>PostgreSQL</productname> cluster</refpurpose>
> > + </refnamediv>
>
> "verify the integrity of a backup taken using pg_basebackup"

OK.

> This seems to invite the idea that pg_validatebackup should be able to
> work with external backup solutions- but I'm a bit concerned by that
> idea because it seems like it would then mean we'd have to be
> particularly careful when changing things in this area, and I'm not
> thrilled by that.  I'd like to make sure that new versions of
> pg_validatebackup work with older backups, and, ideally, older versions
> of pg_validatebackup would work even with newer backups, all of which I
> think the json structure of the manifest helps us with, but that's when
> we're building the manifest and know what it's going to look like.

Both you and David made forceful arguments that this needed to be JSON
rather than an ad-hoc text format precisely so that other tools could
parse it more easily, and I just spent *a lot* of time making the JSON
parsing stuff work precisely so that you could have that. This project
would've been done a month ago if not for that. I don't care all that
much whether we remove the mention here, but the idea that using JSON
was so that pg_validatebackup could manage compatibility issues is
just not correct. The version number on line 1 of the file was more
than sufficient for that purpose.

> > +  <para>
> > +   <application>pg_validatebackup</application> reads the manifest file of a
> > +   backup, verifies the manifest against its own internal checksum, and then
> > +   verifies that the same files are present in the target directory as in the
> > +   manifest itself. It then verifies that each file has the expected checksum,
> > +   unless the backup was taken the checksum algorithm set to
>
> "was taken with the checksum algorithm"...

Oops. Will fix.

> > +  <itemizedlist>
> > +    <listitem>
> > +      <para>
> > +        <literal>backup_manifest</literal> is ignored because the backup
> > +        manifest is logically not part of the backup and does not include
> > +        any entry for itself.
> > +      </para>
> > +    </listitem>
>
> This seems a bit confusing, doesn't it?  The backup_manifest must exist,
> and its checksum is internal, and is checked, isn't it?  Why say that
> it's excluded..?

Well, there's no entry in the backup manifest for backup_manifest
itself. Normally, the presence of a file not mentioned in
backup_manifest would cause a complaint about an extra file, but
because backup_manifest is in the ignore list, it doesn't.

> > +    <listitem>
> > +      <para>
> > +        <literal>pg_wal</literal> is ignored because WAL files are sent
> > +        separately from the backup, and are therefore not described by the
> > +        backup manifest.
> > +      </para>
> > +    </listitem>
>
> I don't agree with the choice to exclude the WAL files, considering
> they're an integral part of a backup, to exclude them means that if
> they've been corrupted at all then the entire backup is invalid.  You
> don't want to be discovering that when you're trying to do a restore of
> a backup that you took with pg_basebackup and which pg_validatebackup
> says is valid.  After all, the tool being used here, pg_basebackup,
> *does* also stream the WAL files- there's no reason why we can't
> calculate a checksum on them and store that checksum somewhere and use
> it to validate the WAL files.  This, in my opinion, is actually a
> show-stopper for this feature.  Claiming it's a valid backup when we
> don't check the absolutely necessary-for-restore WAL is making a false
> claim, no matter how well it's documented.

The default for pg_basebackup is -Xstream, which means that the WAL
files are being sent over a separate connection that has no connection
to the original session. The server, when generating the backup
manifest, has no idea what WAL files are being sent over that separate
connection, and thus cannot include them in the manifest. This problem
could be "solved" by having the client generate the manifest rather
than the server, but I think that cure would be worse than the
disease. As it stands, the manifest provides some protection against
transmission errors, which would be lost with that design. As you
point out, this clearly can't be done with -Xnone. I think it would be
possible to support this with -Xfetch, but we'd have to have the
manifest itself specify whether or not it included files in pg_wal,
which would require complicating the format a bit. I don't think that
makes sense. I assume -Xstream is the most commonly-used mode, because
the default used to be -Xfetch and we changed it, which I think we
would not have done unless people liked -Xstream significantly better.
Adding complexity to cater to a non-default case which I suspect is
not widely used doesn't really make sense to me.

In the future, we might want to consider improvements which could make
validation of pg_wal feasible in common cases. Specifically, suppose
that pg_basebackup could receive the manifest from the server, keep
all the entries for the existing files just as they are, but add
entries for WAL files and anything else it may have added to the
backup, recompute the manifest checksum, and store the resulting
revised manifest with the backup. That, I think, would be fairly cool,
but it's a significant body of additional development work, and this
is already quite a large patch. The patch itself has grown to about
3000 lines, and has already 10 preparatory commits doing another ~1500
lines of refactoring to prepare for it.

> Not really thrilled with this (pg_basebackup certainly could figure out
> the checksum for those files...), but I also don't think it's a huge
> issue as they can be recreated by a user (unlike a WAL file..).

Yeah, same issues, though. Here again, there are several possible
fixes: (1) make the server modify those files rather than letting
pg_basebackup do it; (2) make the client compute the manifest rather
than the server; (3) have the client revise the manifest.  (3) makes
most sense to me, but I think that it would be better to return to
that topic at a later date. This is certainly not a perfect feature as
things stand but I believe it is good enough to provide significant
benefits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Amit Kapila
Date:
On Tue, Mar 24, 2020 at 10:30 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Mar 23, 2020 at 11:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > All others except one are passing now.  See the summary of the failed
> > test below and attached are failed run logs.
> >
> > Test Summary Report
> > -------------------
> > t/003_corruption.pl  (Wstat: 65280 Tests: 14 Failed: 0)
> >   Non-zero exit status: 255
> >   Parse errors: Bad plan.  You planned 44 tests but ran 14.
> > Files=6, Tests=123, 164 wallclock secs ( 0.06 usr +  0.02 sys =  0.08 CPU)
> > Result: FAIL
>
> Hmm. It looks like it's trying to remove the symlink that points to
> the tablespace directory, and failing with no error message. I could
> set that permutation to be skipped on Windows, or maybe there's an
> alternate method you can suggest that would work?
>

We can use rmdir() for Windows.  The attached patch fixes the failure
for me. I have tried the test on CentOS as well after the fix and it
passes there as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Mar 23, 2020 at 6:42 PM Stephen Frost <sfrost@snowman.net> wrote:
> > While I get the desire to have a default here that includes checksums,
> > the way the command is structured, it strikes me as odd that the lack of
> > MANIFEST_CHECKSUMS in the command actually results in checksums being
> > included.
>
> I don't think that's quite accurate, because the default for the
> MANIFEST option is 'no', so the actual default if you say nothing
> about manifests at all, you don't get one. However, it is true that if
> you ask for a manifest and you don't specify the type of checksums,
> you get CRC-32C. We could change it so that if you ask for a manifest
> you must also specify the type of checksum, but I don't see any
> advantage in that approach. Nothing prevents the client from
> specifying the value if it cares, but making the default "I don't
> care, you pick" seems pretty sensible. It could be really helpful if,
> for example, we decide to remove the initial default in a future
> release for some reason. Then the client just keeps working without
> needing to change anything, but anyone who explicitly specified the
> old default gets an error.

I get that the default for manifest is 'no', but I don't really see how
that means that the lack of saying anything about checksums should mean
"give me crc32c checksums".  It's really rather common that if we don't
specify something, it means don't do that thing- like an 'ORDER BY'
clause.  We aren't designing SQL here, so I'm not going to get terribly
upset if you push forward with "if you don't want checksums, you have to
explicitly say MANIFEST_CHECKSUMS no", but I don't agree with the
reasoning here.

> > > +     <varlistentry>
> > > +      <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
> > > +      <listitem>
> > > +       <para>
> > > +        Specifies the algorithm that should be used to checksum each file
> > > +        for purposes of the backup manifest. Currently, the available
> > > +        algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
> > > +        <literal>SHA224</literal>, <literal>SHA256</literal>,
> > > +        <literal>SHA384</literal>, and <literal>SHA512</literal>.
> > > +        The default is <literal>CRC32C</literal>.
> > > +       </para>
> >
> > As I recall, there was an invitation to argue about the defaults at one
> > point, and so I'm going to say here that I would advocate for a
> > different default than 'crc32c'.  Specifically, I would think sha256 or
> > 512 would be better.  I don't recall seeing a debate about this that
> > conclusively found crc32c to be better, but I'm happy to go back and
> > reread anything someone wants to point me at.
>
> It was discussed upthread. Andrew Dunstan argued that there was no
> reason to use a cryptographic checksum here and that we shouldn't do
> so gratuitously. Suraj Kharage found that CRC-32C has very little
> performance impact but that any of the SHA functions slow down backups
> considerably. David Steele pointed out that you'd need a better
> checksum if you wanted to use it for purposes such as delta restore,
> with which I agree, but that's not the design center for this feature.
> I concluded that different people wanted different things, so that we
> ought to make this configurable, but that CRC-32C is a good default.
> It has approximately a 99.9999999767169% chance of detecting a random
> error, which is pretty good, and it doesn't drastically slow down
> backups, which is also good.

There were also comments made up-thread about how it might not be great
for larger (eg: 1GB files, like we tend to have quite a few of...), and
something about it being a 40 year old algorithm..  Having re-read some
of the discussion, I'm actually more inclined to say we should be using
sha256 instead of crc32c.

> > It also seems a bit silly to me that using the defaults means having to
> > deal with two different algorithms- crc32c and sha256.  Considering how
> > fast these algorithms are, compared to everything else involved in a
> > backup (particularly one that's likely going across a network...), I
> > wonder if we should say "may slightly increase" above.
>
> Actually, Suraj's results upthread show that it's a pretty big hit.

So, I went back and re-read part of the thread and looked at the
(seemingly, only one..?) post regarding timing and didn't understand
what, exactly, was being timed there, because I didn't see the actual
commands/script/whatever that was used to get those results included.

I'm sure that sha256 takes a lot more time than crc32c, I'm certainly
not trying to dispute that, but what's relevent here is how much it
impacts the time required to run the overall backup (including sync'ing
it to disk, and possibly network transmission time..  if we're just
comparing the time to run it through memory then, sure, the sha256
computation time might end up being quite a bit of the time, but that's
not really that interesting of a test..).

> > > +       <para>
> > > +        On the other hand, <literal>CRC32C</literal> is not a cryptographic
> > > +        hash function, so it is only suitable for protecting against
> > > +        inadvertent or random modifications to a backup. An adversary
> > > +        who can modify the backup could easily do so in such a way that
> > > +        the CRC does not change, whereas a SHA collision will be hard
> > > +        to manufacture. (However, note that if the attacker also has access
> > > +        to modify the backup manifest itself, no checksum algorithm will
> > > +        provide any protection.) An additional advantage of the
> > > +        <literal>SHA</literal> family of functions is that they output
> > > +        a much larger number of bits.
> > > +       </para>
> >
> > I'm not really sure that this paragraph is sensible to include..  We
> > certainly don't talk about adversaries and cryptographic hash functions
> > when we talk about our page-level checksums, for example.  I'm not
> > completely against including it, but I don't want to give the impression
> > that this is something we routinely consider or that lack of discussion
> > elsewhere implies we have protections against a determined attacker.
>
> Given the skepticism from some quarters about CRC-32C on this thread,
> I didn't want to oversell it. Also, I do think that these things are
> possibly things that we should consider more widely. I agree with
> Andrew's complaint that it's far too easy to just throw SHA<lots> at
> problems that don't really require it without any actually good
> reason. Spelling out our reasons for choosing certain algorithms for
> certain purposes seems like a good habit to get into, and if we
> haven't done it in other places, maybe we should. On the other hand,
> while I'm inclined to keep this paragraph, I won't lose much sleep if
> we decide to remove it.

I don't mind spelling out reasoning for certain algorithms over others,
in general, this just seems a bit much.  I'm not sure we need to be
going into what being a cryptographic hash function means every time we
talk about any hash or checksum.  Those who actually care about
cryptographic hash function usage really don't need someone to explain
to them that crc32c isn't cryptographically secure.  The last sentence
also seems kind of odd (why is a much larger number of bits, alone, an
advantage..?).

I tried to figure out a way to rewrite this and I feel like I keep
ending up coming back to something like "CRC32C is a CRC, not a hash"
and that kind of truism just doesn't feel terribly useful to include in
our documentation.

Maybe:

"Using a SHA hash function provides a cryptographically secure digest
of each file for users who wish to verify that the backup has not been
tampered with, while the CRC32C algorithm provides a checksum which is
much faster to calculate and good at catching errors due to accidental
changes but is not resistent to targeted modifications.  Note that, to
be useful against an adversary who has access to the backup, the backup
manifest would need to be stored securely elsewhere or otherwise
verified to have not been modified since the backup was taken."

This at least talks about things in a positive direction (SHA hash
functions do this, CRC32C does that) rather than in a negative tone.

> > This seems to invite the idea that pg_validatebackup should be able to
> > work with external backup solutions- but I'm a bit concerned by that
> > idea because it seems like it would then mean we'd have to be
> > particularly careful when changing things in this area, and I'm not
> > thrilled by that.  I'd like to make sure that new versions of
> > pg_validatebackup work with older backups, and, ideally, older versions
> > of pg_validatebackup would work even with newer backups, all of which I
> > think the json structure of the manifest helps us with, but that's when
> > we're building the manifest and know what it's going to look like.
>
> Both you and David made forceful arguments that this needed to be JSON
> rather than an ad-hoc text format precisely so that other tools could
> parse it more easily, and I just spent *a lot* of time making the JSON
> parsing stuff work precisely so that you could have that. This project
> would've been done a month ago if not for that. I don't care all that
> much whether we remove the mention here, but the idea that using JSON
> was so that pg_validatebackup could manage compatibility issues is
> just not correct. The version number on line 1 of the file was more
> than sufficient for that purpose.

I stand by the decision that the manifest should be in JSON, but that's
what is produced by the backend server as part of a base backup, which
is quite likely going to be used by some external tools, and isn't at
all the same as the external pg_validatebackup command that the
discussion here is about.  I also did make the argument up-thread,
though I'll admit that it seemed to be mostly ignored, but I make it
still, that a simple version number sucks and using JSON does avoid some
of the downsides from it.  Particularly, I'd love to see a v13
pg_validatebackup able to work with a v14 pg_basebackup, even if that
v14 pg_basebackup added some extra stuff to the manifest.  That's
possible to do with a generic structure like JSON and not something that
a simple version number would allow.  Yes, I admit that we might change
the structure or the contents in a way where that wouldn't be possible
and I'm not going to raise a fuss if we do so, but this approach gives
us more options.

Anyway, my point here was really just that *pg_validatebackup* is about
validating backups taken with pg_basebackup.  While it's possible that
it could be used for backups taken with other tools, I don't think
that's really part of its actual mandate or that we're going to actively
work to add such support in the future.

> > > +  <itemizedlist>
> > > +    <listitem>
> > > +      <para>
> > > +        <literal>backup_manifest</literal> is ignored because the backup
> > > +        manifest is logically not part of the backup and does not include
> > > +        any entry for itself.
> > > +      </para>
> > > +    </listitem>
> >
> > This seems a bit confusing, doesn't it?  The backup_manifest must exist,
> > and its checksum is internal, and is checked, isn't it?  Why say that
> > it's excluded..?
>
> Well, there's no entry in the backup manifest for backup_manifest
> itself. Normally, the presence of a file not mentioned in
> backup_manifest would cause a complaint about an extra file, but
> because backup_manifest is in the ignore list, it doesn't.

Yes, I get why it's excluded from the manifest and why we have code to
avoid complaining about it being an extra file, but this is
documentation and, in this part of the docs, we seem to be saying that
we're not checking/validating the manifest, and that's certainly not
actually true.

In particular, the sentence right above this list is:

"Certain files and directories are excluded from verification:"

but we actually do verify the manifest, that's all I'm saying here.

Maybe rewording that a bit is what would help, say:

"Certain files and directories are not included in the manifest:"

then have the entry for backup_manifest be something like:
"backup_manifest is not included as it is the manifest itself and is not
logically part of the backup; backup_manifest is checked using its own
internal validation digest" or something along those lines.

> > > +    <listitem>
> > > +      <para>
> > > +        <literal>pg_wal</literal> is ignored because WAL files are sent
> > > +        separately from the backup, and are therefore not described by the
> > > +        backup manifest.
> > > +      </para>
> > > +    </listitem>
> >
> > I don't agree with the choice to exclude the WAL files, considering
> > they're an integral part of a backup, to exclude them means that if
> > they've been corrupted at all then the entire backup is invalid.  You
> > don't want to be discovering that when you're trying to do a restore of
> > a backup that you took with pg_basebackup and which pg_validatebackup
> > says is valid.  After all, the tool being used here, pg_basebackup,
> > *does* also stream the WAL files- there's no reason why we can't
> > calculate a checksum on them and store that checksum somewhere and use
> > it to validate the WAL files.  This, in my opinion, is actually a
> > show-stopper for this feature.  Claiming it's a valid backup when we
> > don't check the absolutely necessary-for-restore WAL is making a false
> > claim, no matter how well it's documented.
>
> The default for pg_basebackup is -Xstream, which means that the WAL
> files are being sent over a separate connection that has no connection
> to the original session. The server, when generating the backup
> manifest, has no idea what WAL files are being sent over that separate
> connection, and thus cannot include them in the manifest. This problem
> could be "solved" by having the client generate the manifest rather
> than the server, but I think that cure would be worse than the
> disease. As it stands, the manifest provides some protection against
> transmission errors, which would be lost with that design. As you
> point out, this clearly can't be done with -Xnone. I think it would be
> possible to support this with -Xfetch, but we'd have to have the
> manifest itself specify whether or not it included files in pg_wal,
> which would require complicating the format a bit. I don't think that
> makes sense. I assume -Xstream is the most commonly-used mode, because
> the default used to be -Xfetch and we changed it, which I think we
> would not have done unless people liked -Xstream significantly better.
> Adding complexity to cater to a non-default case which I suspect is
> not widely used doesn't really make sense to me.

Yeah, I get that it's not easy to figure out how to validate the WAL,
but I stand by my opinion that it's simply not acceptable to exclude the
necessary WAL from verification so and to claim that a backup is valid
when we haven't checked the WAL.

I agree that -Xfetch isn't commonly used and only supporting validation
of WAL when that's used isn't a good answer.

> In the future, we might want to consider improvements which could make
> validation of pg_wal feasible in common cases. Specifically, suppose
> that pg_basebackup could receive the manifest from the server, keep
> all the entries for the existing files just as they are, but add
> entries for WAL files and anything else it may have added to the
> backup, recompute the manifest checksum, and store the resulting
> revised manifest with the backup. That, I think, would be fairly cool,
> but it's a significant body of additional development work, and this
> is already quite a large patch. The patch itself has grown to about
> 3000 lines, and has already 10 preparatory commits doing another ~1500
> lines of refactoring to prepare for it.

Having the client calculate the checksums for the WAL and add them to
the manifest is one approach and could work, but there's others-

- Have the WAL checksums be calculated during the base backup and kept
  somewhere, and then included in the manifest sent by the server- the
  backup_manifest is the last thing we send anyway, isn't it?  And
  surely at the end of the backup we actually do know all of the WAL
  that's needed for the backup to be valid, because we pass that
  information to pg_basebackup to construct the necessary backup_label
  file.

- Validate the WAL using its own internal checksums instead of having
  the manifest involved at all.  That's not ideal since we wouldn't have
  cryptographically secure digests for the WAL, but at least we will
  have validated it and raised the chances that the backup will be able
  to actually be restored using PG a whole bunch.

- With the 'checksum none' option, we aren't really validating contents
  of anything, so in that case it'd actually be alright to simply scan
  the WAL and make sure that we've at least got all of the WAL files
  needed to go from the start of the backup to the end.  I don't think
  just checking that the WAL files exist is a proper solution when it
  comes to a backup where the user has asked for checksums to be
  included though.  I will say that I'm really very surprised that
  pg_validatebackup wasn't already checking that we at least had the WAL
  that is needed, but I don't see any code for that.

> > Not really thrilled with this (pg_basebackup certainly could figure out
> > the checksum for those files...), but I also don't think it's a huge
> > issue as they can be recreated by a user (unlike a WAL file..).
>
> Yeah, same issues, though. Here again, there are several possible
> fixes: (1) make the server modify those files rather than letting
> pg_basebackup do it; (2) make the client compute the manifest rather
> than the server; (3) have the client revise the manifest.  (3) makes
> most sense to me, but I think that it would be better to return to
> that topic at a later date. This is certainly not a perfect feature as
> things stand but I believe it is good enough to provide significant
> benefits.

As I said, I don't consider these files to be as much of an issue and
therefore excluding them and documenting that we do would be alright.  I
don't feel that's an acceptable option for the WAL though.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Wed, Mar 25, 2020 at 9:31 AM Stephen Frost <sfrost@snowman.net> wrote:
> I get that the default for manifest is 'no', but I don't really see how
> that means that the lack of saying anything about checksums should mean
> "give me crc32c checksums".  It's really rather common that if we don't
> specify something, it means don't do that thing- like an 'ORDER BY'
> clause.

That's a fair argument, but I think the other relevant principle is
that we try to give people useful defaults for things. I think that
checksums are a sufficiently useful thing that having the default be
not to do it doesn't make sense. I had the impression that you and
David were in agreement on that point, actually.

> There were also comments made up-thread about how it might not be great
> for larger (eg: 1GB files, like we tend to have quite a few of...), and
> something about it being a 40 year old algorithm..

Well, the 512MB "limit" for CRC-32C means only that for certain very
specific types of errors, detection is not guaranteed above that file
size. So if you have a single flipped bit, for example, and the file
size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
chance of detecting the error, whereas if the file size is less than
512MB, it is 100% certain, because of the design of the algorithm. But
nine nines is plenty, and neither SHA nor our page-level checksums
provide guaranteed error detection properties anyway.

I'm not sure why the fact that it's a 40-year-old algorithm is
relevant. There are many 40-year-old algorithms that are very good.
Generally, if we discover that we're using bad 40-year-old algorithms,
like Knuth's tape sorting stuff, we eventually figure out how to
replace them with something else that's better. But there's no reason
to retire an algorithm simply because it's old. I have not heard
anyone say, for example, that we should stop using CRC-32C for XLOG
checksums. We continue to use it for that purpose because it (1) is
highly likely to detect any errors and (2) is very fast. Those are the
same reasons why I think it's a good fit for this case.

My guess is that if this patch is adopted as currently proposed, we
will eventually need to replace the cryptographic hash functions due
to the march of time. As I'm sure you realize, the problem with hash
functions that are designed to foil an adversary is that adversaries
keep getting smarter. So, eventually someone will probably figure out
how to do something nefarious with SHA-512. Some other technique that
nobody's cracked yet will need to be adopted, and then people will
begin trying to crack that, and the whole thing will repeat. But I
suspect that we can keep using the same non-cryptographic hash
function essentially forever. It does not matter that people know how
the algorithm works because it makes no pretensions of trying to foil
an opponent. It is just trying to mix up the bits in such a way that a
change to the file is likely to cause a change in the checksum. The
bit-mixing properties of the algorithm do not degrade with the passage
of time.

> I'm sure that sha256 takes a lot more time than crc32c, I'm certainly
> not trying to dispute that, but what's relevent here is how much it
> impacts the time required to run the overall backup (including sync'ing
> it to disk, and possibly network transmission time..  if we're just
> comparing the time to run it through memory then, sure, the sha256
> computation time might end up being quite a bit of the time, but that's
> not really that interesting of a test..).

I think that http://postgr.es/m/38e29a1c-0d20-fc73-badd-ca05f7f07ffa@pgmasters.net
is one of the more interesting emails on this topic.  My conclusion
from that email, and the ones that led up to it, was that there is a
40-50% overhead from doing a SHA checksum, but in pgbackrest, users
don't see it because backups are compressed. Because the compression
uses so much CPU time, the additional overhead from the SHA checksum
is only a few percent more. But I don't think that it would be smart
to slow down uncompressed backups by 40-50%. That's going to cause a
problem for somebody, almost for sure.

> Maybe:
>
> "Using a SHA hash function provides a cryptographically secure digest
> of each file for users who wish to verify that the backup has not been
> tampered with, while the CRC32C algorithm provides a checksum which is
> much faster to calculate and good at catching errors due to accidental
> changes but is not resistent to targeted modifications.  Note that, to
> be useful against an adversary who has access to the backup, the backup
> manifest would need to be stored securely elsewhere or otherwise
> verified to have not been modified since the backup was taken."
>
> This at least talks about things in a positive direction (SHA hash
> functions do this, CRC32C does that) rather than in a negative tone.

Cool. I like it.

> Anyway, my point here was really just that *pg_validatebackup* is about
> validating backups taken with pg_basebackup.  While it's possible that
> it could be used for backups taken with other tools, I don't think
> that's really part of its actual mandate or that we're going to actively
> work to add such support in the future.

I think you're kind just nitpicking here, because the statement that
pg_validatebackup can validate not only a backup taken by
pg_basebackup but also a backup taken in using some compatible method
is just a tautology. But I'll remove the reference.

> In particular, the sentence right above this list is:
>
> "Certain files and directories are excluded from verification:"
>
> but we actually do verify the manifest, that's all I'm saying here.
>
> Maybe rewording that a bit is what would help, say:
>
> "Certain files and directories are not included in the manifest:"

Well, that'd be wrong, though. It's true that backup_manifest won't
have an entry in the manifest, and neither will WAL files, but
postgresql.auto.conf will. We'll just skip complaining about it if the
checksum doesn't match or whatever. The server generates manifest
entries for everything, and the client decides not to pay attention to
some of them because it knows that pg_basebackup may have made certain
changes that were not known to the server.

> Yeah, I get that it's not easy to figure out how to validate the WAL,
> but I stand by my opinion that it's simply not acceptable to exclude the
> necessary WAL from verification so and to claim that a backup is valid
> when we haven't checked the WAL.

I hear that, but I don't agree that having nothing is better than
having this much committed. I would be fine with renaming the tool
(pg_validatebackupmanifest? pg_validatemanifest?), or with updating
the documentation to be more clear about what is and is not checked,
but I'm not going to extent the tool to do totally new things for
which we don't even have an agreed design yet. I believe in trying to
create patches that do one thing and do it well, and this patch does
that. The fact that it doesn't do some other thing that is
conceptually related yet different is a good thing, not a bad one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Mar 25, 2020 at 9:31 AM Stephen Frost <sfrost@snowman.net> wrote:
> > I get that the default for manifest is 'no', but I don't really see how
> > that means that the lack of saying anything about checksums should mean
> > "give me crc32c checksums".  It's really rather common that if we don't
> > specify something, it means don't do that thing- like an 'ORDER BY'
> > clause.
>
> That's a fair argument, but I think the other relevant principle is
> that we try to give people useful defaults for things. I think that
> checksums are a sufficiently useful thing that having the default be
> not to do it doesn't make sense. I had the impression that you and
> David were in agreement on that point, actually.

I agree with wanting to have useful defaults and that checksums should
be included by default, and I'm alright even with letting people pick
what algorithms they'd like to have too.  The construct here is made odd
because we've got this idea that "no checksum" is an option, which is
actually something that I don't particularly like, but that's what's
making this particular syntax weird.  I don't suppose you'd be open to
the idea of just dropping that though..?  There wouldn't be any issue
with this syntax if we just always had checksums included when a
manifest is requested. :)

Somehow, I don't think I'm going to win that argument.

> > There were also comments made up-thread about how it might not be great
> > for larger (eg: 1GB files, like we tend to have quite a few of...), and
> > something about it being a 40 year old algorithm..
>
> Well, the 512MB "limit" for CRC-32C means only that for certain very
> specific types of errors, detection is not guaranteed above that file
> size. So if you have a single flipped bit, for example, and the file
> size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
> chance of detecting the error, whereas if the file size is less than
> 512MB, it is 100% certain, because of the design of the algorithm. But
> nine nines is plenty, and neither SHA nor our page-level checksums
> provide guaranteed error detection properties anyway.

Right, so we know that CRC-32C has an upper-bound of 512MB to be useful
for exactly what it's designed to be useful for, but we also know that
we're going to have larger files- at least 1GB ones, and quite possibly
larger, so why are we choosing this?

At the least, wouldn't it make sense to consider a larger CRC, one whose
limit is above the size of commonly expected files, if we're going to
use a CRC?

> I'm not sure why the fact that it's a 40-year-old algorithm is
> relevant. There are many 40-year-old algorithms that are very good.

Sure there are, but there probably wasn't a lot of thought about
GB-sized files, and this doesn't really seem to be the direction people
are going in for larger objects.  s3, as an example, uses sha256.
Google, it seems, suggests folks use "HighwayHash" (from their crc32c
github repo- https://github.com/google/crc32c).  Most CRC uses seem to
be for much smaller data sets.

> My guess is that if this patch is adopted as currently proposed, we
> will eventually need to replace the cryptographic hash functions due
> to the march of time. As I'm sure you realize, the problem with hash
> functions that are designed to foil an adversary is that adversaries
> keep getting smarter. So, eventually someone will probably figure out
> how to do something nefarious with SHA-512. Some other technique that
> nobody's cracked yet will need to be adopted, and then people will
> begin trying to crack that, and the whole thing will repeat. But I
> suspect that we can keep using the same non-cryptographic hash
> function essentially forever. It does not matter that people know how
> the algorithm works because it makes no pretensions of trying to foil
> an opponent. It is just trying to mix up the bits in such a way that a
> change to the file is likely to cause a change in the checksum. The
> bit-mixing properties of the algorithm do not degrade with the passage
> of time.

Sure, there's a good chance we'll need newer algorithms in the future, I
don't doubt that.  On the other hand, if crc32c, or CRC whatever, was
the perfect answer and no one will ever need something better, then
what's with folks like Google suggesting something else..?

> > I'm sure that sha256 takes a lot more time than crc32c, I'm certainly
> > not trying to dispute that, but what's relevent here is how much it
> > impacts the time required to run the overall backup (including sync'ing
> > it to disk, and possibly network transmission time..  if we're just
> > comparing the time to run it through memory then, sure, the sha256
> > computation time might end up being quite a bit of the time, but that's
> > not really that interesting of a test..).
>
> I think that http://postgr.es/m/38e29a1c-0d20-fc73-badd-ca05f7f07ffa@pgmasters.net
> is one of the more interesting emails on this topic.  My conclusion
> from that email, and the ones that led up to it, was that there is a
> 40-50% overhead from doing a SHA checksum, but in pgbackrest, users
> don't see it because backups are compressed. Because the compression
> uses so much CPU time, the additional overhead from the SHA checksum
> is only a few percent more. But I don't think that it would be smart
> to slow down uncompressed backups by 40-50%. That's going to cause a
> problem for somebody, almost for sure.

I like that email on the topic also, as it points out again (as I tried
to do earlier also..) that it depends on what we're actually including
in the test- and it seems, again, that those tests didn't consider the
time to actually write the data somewhere, either network or disk.

As for folks who are that close to the edge on their backup timing that
they can't have it slow down- chances are pretty darn good that they're
not far from ending up needing to find a better solution than
pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
suppose, they could have one but not have checksums..).

> > In particular, the sentence right above this list is:
> >
> > "Certain files and directories are excluded from verification:"
> >
> > but we actually do verify the manifest, that's all I'm saying here.
> >
> > Maybe rewording that a bit is what would help, say:
> >
> > "Certain files and directories are not included in the manifest:"
>
> Well, that'd be wrong, though. It's true that backup_manifest won't
> have an entry in the manifest, and neither will WAL files, but
> postgresql.auto.conf will. We'll just skip complaining about it if the
> checksum doesn't match or whatever. The server generates manifest
> entries for everything, and the client decides not to pay attention to
> some of them because it knows that pg_basebackup may have made certain
> changes that were not known to the server.

Ok, but it's also wrong to say that the backup_label is excluded from
verification.

> > Yeah, I get that it's not easy to figure out how to validate the WAL,
> > but I stand by my opinion that it's simply not acceptable to exclude the
> > necessary WAL from verification so and to claim that a backup is valid
> > when we haven't checked the WAL.
>
> I hear that, but I don't agree that having nothing is better than
> having this much committed. I would be fine with renaming the tool
> (pg_validatebackupmanifest? pg_validatemanifest?), or with updating
> the documentation to be more clear about what is and is not checked,
> but I'm not going to extent the tool to do totally new things for
> which we don't even have an agreed design yet. I believe in trying to
> create patches that do one thing and do it well, and this patch does
> that. The fact that it doesn't do some other thing that is
> conceptually related yet different is a good thing, not a bad one.

I fail to see the usefulness of a tool that doesn't actually verify that
the backup is able to be restored from.

Even pg_basebackup (in both fetch and stream modes...) checks that we at
least got all the WAL that's needed for the backup from the server
before considering the backup to be valid and telling the user that
there was a successful backup.  With what you're proposing here, we
could have someone do a pg_basebackup, get back an ERROR saying the
backup wasn't valid, and then run pg_validatebackup and be told that the
backup is valid.  I don't get how that's sensible.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost@snowman.net> wrote:
> > That's a fair argument, but I think the other relevant principle is
> > that we try to give people useful defaults for things. I think that
> > checksums are a sufficiently useful thing that having the default be
> > not to do it doesn't make sense. I had the impression that you and
> > David were in agreement on that point, actually.
>
> I agree with wanting to have useful defaults and that checksums should
> be included by default, and I'm alright even with letting people pick
> what algorithms they'd like to have too.  The construct here is made odd
> because we've got this idea that "no checksum" is an option, which is
> actually something that I don't particularly like, but that's what's
> making this particular syntax weird.  I don't suppose you'd be open to
> the idea of just dropping that though..?  There wouldn't be any issue
> with this syntax if we just always had checksums included when a
> manifest is requested. :)
>
> Somehow, I don't think I'm going to win that argument.

Well, it's not a crazy idea. So, at some point, I had the idea that
you were always going to get a manifest, and therefore you should at
least ought to have the option of not checksumming to avoid the
overhead. But, as things stand now, you can suppress the manifest
altogether, so that you can still take a backup even if you've got no
disk space to spool the manifest on the master. So, if you really want
no overhead from manifests, just don't have a manifest. And if you are
OK with some overhead, why not at least have a CRC-32C checksum, which
is, after all, pretty cheap?

Now, on the other hand, I don't have any strong evidence that the
manifest-without-checksums mode is useless. You can still use it to
verify that you have the correct files and that those files have the
expected sizes. And, verifying those things is very cheap, because you
only need to stat() each file, not open and read them all. True, you
can do those things by using pg_validatebackup -s. But, you'd still
incur the (admittedly fairly low) overhead of computing checksums that
you don't intend to use.

This is where I feel like I'm trying to make decisions in a vacuum. If
we had a few more people weighing in on the thread on this point, I'd
be happy to go with whatever the consensus was. If most people think
having both --no-manifest (suppressing the manifest completely) and
--manifest-checksums=none (suppressing only the checksums) is useless
and confusing, then sure, let's rip the latter one out. If most people
like the flexibility, let's keep it: it's already implemented and
tested. But I hate to base the decision on what one or two people
think.

> > Well, the 512MB "limit" for CRC-32C means only that for certain very
> > specific types of errors, detection is not guaranteed above that file
> > size. So if you have a single flipped bit, for example, and the file
> > size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
> > chance of detecting the error, whereas if the file size is less than
> > 512MB, it is 100% certain, because of the design of the algorithm. But
> > nine nines is plenty, and neither SHA nor our page-level checksums
> > provide guaranteed error detection properties anyway.
>
> Right, so we know that CRC-32C has an upper-bound of 512MB to be useful
> for exactly what it's designed to be useful for, but we also know that
> we're going to have larger files- at least 1GB ones, and quite possibly
> larger, so why are we choosing this?
>
> At the least, wouldn't it make sense to consider a larger CRC, one whose
> limit is above the size of commonly expected files, if we're going to
> use a CRC?

I mean, you're just repeating the same argument here, and it's just
not valid. Regardless of the file size, the chances of a false
checksum match are literally less than one in a billion. There is
every reason to believe that users will be happy with a low-overhead
method that has a 99.9999999+% chance of detecting corrupt files. I do
agree that a 64-bit CRC would probably be not much more expensive and
improve the probability of detecting errors even further, but I wanted
to restrict this patch to using infrastructure we already have. The
choices there are the various SHA functions (so I supported those),
MD5 (which I deliberately omitted, for reasons I hope you'll be the
first to agree with), CRC-32C (which is fast), a couple of other
CRC-32 variants (which I omitted because they seemed redundant and one
of them only ever existed in PostgreSQL because of a coding mistake),
and the hacked-up version of FNV that we use for page-level checksums
(which is only 16 bits and seems to have no advantages for this
purpose).

> > I'm not sure why the fact that it's a 40-year-old algorithm is
> > relevant. There are many 40-year-old algorithms that are very good.
>
> Sure there are, but there probably wasn't a lot of thought about
> GB-sized files, and this doesn't really seem to be the direction people
> are going in for larger objects.  s3, as an example, uses sha256.
> Google, it seems, suggests folks use "HighwayHash" (from their crc32c
> github repo- https://github.com/google/crc32c).  Most CRC uses seem to
> be for much smaller data sets.

Again, I really want to stick with infrastructure we already have.
Trying to find a hash function that will please everybody is a hole
with no bottom, or more to the point, a bikeshed in need of painting.
There are TONS of great hash functions out there on the Internet, and
as previous discussions of pgsql-hackers will attest, as soon as you
go down that road, somebody will say "well, what about xxhash" or
whatever, and then you spend the rest of your life trying to figure
out what hash function we could try to commit that is fast and secure
and doesn't have copyright or patent problems. There have been
multiple efforts to introduce such hash functions in the past, and I
think basically all of those have crashed into a brick wall.

I don't think that's because introducing new hash functions is a bad
idea. I think that there are various reasons why it might be a good
idea. For instance, highwayhash purports to be a cryptographic hash
function that is fast enough to replace non-cryptographic hash
functions. It's easy to see why someone might want that, here. For
example, it would be entirely reasonable to copy the backup manifest
onto a USB key and store it in a vault. Later, if you get the USB key
back out of the vault and validate it against the backup, you pretty
much know that none of the data files have been tampered with,
provided that you used a cryptographic hash. So, SHA is a good option
for people who have a USB key and a vault, and a faster cryptographic
might be even better. I don't have any desire to block such proposals,
and I would be thrilled if this work inspires other people to add such
options. However, I also don't want this patch to get blocked by an
interminable argument about which hash functions we ought to use. The
ones we have in core now are good enough for a start, and more can be
added later.

> Sure, there's a good chance we'll need newer algorithms in the future, I
> don't doubt that.  On the other hand, if crc32c, or CRC whatever, was
> the perfect answer and no one will ever need something better, then
> what's with folks like Google suggesting something else..?

I have never said that CRC was the perfect answer, and the reason why
Google is suggesting something different is because they wanted a fast
hash (not SHA) that still has cryptographic properties. What I have
said is that using CRC-32C by default means that there is very little
downside as compared with current releases. Backups will not get
slower, and error detection will get better. If you pick any other
default from the menu of options currently available, then either
backups get noticeably slower, or we get less error detection
capability than that option gives us.

> As for folks who are that close to the edge on their backup timing that
> they can't have it slow down- chances are pretty darn good that they're
> not far from ending up needing to find a better solution than
> pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
> suppose, they could have one but not have checksums..).

40-50% is a lot more than "if you were on the edge."

> > Well, that'd be wrong, though. It's true that backup_manifest won't
> > have an entry in the manifest, and neither will WAL files, but
> > postgresql.auto.conf will. We'll just skip complaining about it if the
> > checksum doesn't match or whatever. The server generates manifest
> > entries for everything, and the client decides not to pay attention to
> > some of them because it knows that pg_basebackup may have made certain
> > changes that were not known to the server.
>
> Ok, but it's also wrong to say that the backup_label is excluded from
> verification.

The docs don't say that backup_label is excluded from verification.
They do say that backup_manifest is excluded from verification
*against the manifest*, because it is. I'm not sure if you're honestly
confused here or if we're just devolving into arguing for the sake of
argument, but right now the code looks like this:

    simple_string_list_append(&context.ignore_list, "backup_manifest");
    simple_string_list_append(&context.ignore_list, "pg_wal");
    simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
    simple_string_list_append(&context.ignore_list, "recovery.signal");
    simple_string_list_append(&context.ignore_list, "standby.signal");

Notice that this is the same list of files mentioned in the
documentation. Now let's suppose we remove the first of those lines of
code, so that backup_manifest is not in the exclude list by default.
Now let's try to validate a backup:

[rhaas pgsql]$ src/bin/pg_validatebackup/pg_validatebackup ~/pgslave
pg_validatebackup: error: "backup_manifest" is present on disk but not
in the manifest

Oops. If you read that error carefully, you can see that the complaint
is 100% valid. backup_manifest is indeed present on disk, but not in
the manifest. However, because this situation is expected and known
not to be a problem, the right thing to do is suppress the error. That
is why it is in the ignore_list by default. The documentation is
attempting to explain this. If it's unclear, we should try to make it
better, but it is absolutely NOT saying that there is no internal
validation of the backup_manifest. In fact, the previous paragraph
tries to explain that:

+   <application>pg_validatebackup</application> reads the manifest file of a
+   backup, verifies the manifest against its own internal checksum, and then

It is, however, saying, and *entirely correctly*, that
pg_validatebackup will not check the backup_manifest file against the
backup_manifest. If it did, it would find that it's not there. It
would then emit an error message like the one above even though
there's no problem with the backup.

> I fail to see the usefulness of a tool that doesn't actually verify that
> the backup is able to be restored from.
>
> Even pg_basebackup (in both fetch and stream modes...) checks that we at
> least got all the WAL that's needed for the backup from the server
> before considering the backup to be valid and telling the user that
> there was a successful backup.  With what you're proposing here, we
> could have someone do a pg_basebackup, get back an ERROR saying the
> backup wasn't valid, and then run pg_validatebackup and be told that the
> backup is valid.  I don't get how that's sensible.

I'm sorry that you can't see how that's sensible, but it doesn't mean
that it isn't sensible. It is totally unrealistic to expect that any
backup verification tool can verify that you won't get an error when
trying to use the backup. That would require that everything that the
validation tool try to do everything that PostgreSQL will try to do
when the backup is used, including running recovery and updating the
data files. Anything less than that creates a real possibility that
the backup will verify good but fail when used. This tool has a much
narrower purpose, which is to try to verify that we (still) have the
files the server sent as part of the backup and that, to the best of
our ability to detect such things, they have not been modified. As you
know, or should know, the WAL files are not sent as part of the
backup, and so are not verified. Other things that would also be
useful to check are also not verified. It would be fantastic to have
more verification tools in the future, but it is difficult to see why
anyone would bother trying if an attempt to get the first one
committed gets blocked because it does not yet do everything. Very few
patches try to do everything, and those that do usually get blocked
because, by trying to do too much, they get some of it badly wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > That's a fair argument, but I think the other relevant principle is
> > > that we try to give people useful defaults for things. I think that
> > > checksums are a sufficiently useful thing that having the default be
> > > not to do it doesn't make sense. I had the impression that you and
> > > David were in agreement on that point, actually.
> >
> > I agree with wanting to have useful defaults and that checksums should
> > be included by default, and I'm alright even with letting people pick
> > what algorithms they'd like to have too.  The construct here is made odd
> > because we've got this idea that "no checksum" is an option, which is
> > actually something that I don't particularly like, but that's what's
> > making this particular syntax weird.  I don't suppose you'd be open to
> > the idea of just dropping that though..?  There wouldn't be any issue
> > with this syntax if we just always had checksums included when a
> > manifest is requested. :)
> >
> > Somehow, I don't think I'm going to win that argument.
>
> Well, it's not a crazy idea. So, at some point, I had the idea that
> you were always going to get a manifest, and therefore you should at
> least ought to have the option of not checksumming to avoid the
> overhead. But, as things stand now, you can suppress the manifest
> altogether, so that you can still take a backup even if you've got no
> disk space to spool the manifest on the master. So, if you really want
> no overhead from manifests, just don't have a manifest. And if you are
> OK with some overhead, why not at least have a CRC-32C checksum, which
> is, after all, pretty cheap?
>
> Now, on the other hand, I don't have any strong evidence that the
> manifest-without-checksums mode is useless. You can still use it to
> verify that you have the correct files and that those files have the
> expected sizes. And, verifying those things is very cheap, because you
> only need to stat() each file, not open and read them all. True, you
> can do those things by using pg_validatebackup -s. But, you'd still
> incur the (admittedly fairly low) overhead of computing checksums that
> you don't intend to use.
>
> This is where I feel like I'm trying to make decisions in a vacuum. If
> we had a few more people weighing in on the thread on this point, I'd
> be happy to go with whatever the consensus was. If most people think
> having both --no-manifest (suppressing the manifest completely) and
> --manifest-checksums=none (suppressing only the checksums) is useless
> and confusing, then sure, let's rip the latter one out. If most people
> like the flexibility, let's keep it: it's already implemented and
> tested. But I hate to base the decision on what one or two people
> think.

I'm frustrated at the lack of involvement from others also.

Just to be clear- I'm not completely against having a 'manifest but no
checksum' option, but if that's what we're going to have then it seems
like the syntax should be such that if you don't specify checksums then
you don't get checksums and "MANIFEST_CHECKSUM none" shouldn't be a
thing.

All that said, as I said up-thread, I appreciate that we aren't
designing SQL here and that this is pretty special syntax to begin with,
so if you ended up committing it the way you have it now, so be it, I
wouldn't be asking for it to be reverted over this.  It's a bit awkward
and kind of a thorn, but it's not entirely unreasonable, and we'd
probably end up there anyway if we started out without a 'none' option
and someone did come up with a good argument and a patch to add such an
option in the future.

> > > Well, the 512MB "limit" for CRC-32C means only that for certain very
> > > specific types of errors, detection is not guaranteed above that file
> > > size. So if you have a single flipped bit, for example, and the file
> > > size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
> > > chance of detecting the error, whereas if the file size is less than
> > > 512MB, it is 100% certain, because of the design of the algorithm. But
> > > nine nines is plenty, and neither SHA nor our page-level checksums
> > > provide guaranteed error detection properties anyway.
> >
> > Right, so we know that CRC-32C has an upper-bound of 512MB to be useful
> > for exactly what it's designed to be useful for, but we also know that
> > we're going to have larger files- at least 1GB ones, and quite possibly
> > larger, so why are we choosing this?
> >
> > At the least, wouldn't it make sense to consider a larger CRC, one whose
> > limit is above the size of commonly expected files, if we're going to
> > use a CRC?
>
> I mean, you're just repeating the same argument here, and it's just
> not valid. Regardless of the file size, the chances of a false
> checksum match are literally less than one in a billion. There is
> every reason to believe that users will be happy with a low-overhead
> method that has a 99.9999999+% chance of detecting corrupt files. I do
> agree that a 64-bit CRC would probably be not much more expensive and
> improve the probability of detecting errors even further, but I wanted
> to restrict this patch to using infrastructure we already have. The
> choices there are the various SHA functions (so I supported those),
> MD5 (which I deliberately omitted, for reasons I hope you'll be the
> first to agree with), CRC-32C (which is fast), a couple of other
> CRC-32 variants (which I omitted because they seemed redundant and one
> of them only ever existed in PostgreSQL because of a coding mistake),
> and the hacked-up version of FNV that we use for page-level checksums
> (which is only 16 bits and seems to have no advantages for this
> purpose).

The argument that "well, we happened to already have it, even though we
used it for much smaller data sets, which are well within the
100%-single-bit-error detection limit" certainly doesn't make me be in
more support of this.  Choosing the right algorithm to use maybe
shouldn't be based on the age of that algorithm, but it also certainly
shouldn't be "just because we already have it" when we're using it for a
very different use-case.

I'm guessing folks have already seen it, but I thought this was an
interesting run-down of actual collisions based on various checksum
lengths using one data set (though it's not clear exactly how big it is,
from what I can see)-

http://www.backplane.com/matt/crc64.html

I do agree with excluding things like md5 and others that aren't good
options.  I wasn't saying we should necessarily exclude crc32c either..
but rather saying that it shouldn't be the default.

Here's another way to look at it- where do we use crc32c today, and how
much data might we possibly be covering with that crc?  Why was crc32c
picked for that purpose?  If the individual who decided to pick crc32c
for that case was contemplating a checksum for up-to-1GB files, would
they have picked crc32c?  Seems unlikely to me.

> > > I'm not sure why the fact that it's a 40-year-old algorithm is
> > > relevant. There are many 40-year-old algorithms that are very good.
> >
> > Sure there are, but there probably wasn't a lot of thought about
> > GB-sized files, and this doesn't really seem to be the direction people
> > are going in for larger objects.  s3, as an example, uses sha256.
> > Google, it seems, suggests folks use "HighwayHash" (from their crc32c
> > github repo- https://github.com/google/crc32c).  Most CRC uses seem to
> > be for much smaller data sets.
>
> Again, I really want to stick with infrastructure we already have.

I don't agree with that as a sensible justification for picking it for
this case, because it's clearly not the same use-case.

> Trying to find a hash function that will please everybody is a hole
> with no bottom, or more to the point, a bikeshed in need of painting.
> There are TONS of great hash functions out there on the Internet, and
> as previous discussions of pgsql-hackers will attest, as soon as you
> go down that road, somebody will say "well, what about xxhash" or
> whatever, and then you spend the rest of your life trying to figure
> out what hash function we could try to commit that is fast and secure
> and doesn't have copyright or patent problems. There have been
> multiple efforts to introduce such hash functions in the past, and I
> think basically all of those have crashed into a brick wall.
>
> I don't think that's because introducing new hash functions is a bad
> idea. I think that there are various reasons why it might be a good
> idea. For instance, highwayhash purports to be a cryptographic hash
> function that is fast enough to replace non-cryptographic hash
> functions. It's easy to see why someone might want that, here. For
> example, it would be entirely reasonable to copy the backup manifest
> onto a USB key and store it in a vault. Later, if you get the USB key
> back out of the vault and validate it against the backup, you pretty
> much know that none of the data files have been tampered with,
> provided that you used a cryptographic hash. So, SHA is a good option
> for people who have a USB key and a vault, and a faster cryptographic
> might be even better. I don't have any desire to block such proposals,
> and I would be thrilled if this work inspires other people to add such
> options. However, I also don't want this patch to get blocked by an
> interminable argument about which hash functions we ought to use. The
> ones we have in core now are good enough for a start, and more can be
> added later.

I'm not actually argueing about which hash functions we should support,
but rather what the default is and if crc32c, specifically, is actually
a reasonable choice.  Just because it's fast and we already had an
implementation of it doesn't justify its use as the default.  Given that
it doesn't actually provide the check that is generally expected of
CRC checksums (100% detection of single-bit errors) when the file size
gets over 512MB makes me wonder if we should have it at all, yes, but it
definitely makes me think it shouldn't be our default.

Folks look to PG as being pretty good at figuring things out and doing
the thing that makes sense to minimize risk of data loss or corruption.
I can understand and agree with the desire to have a faster alternative
to sha256 for those who don't need a cryptographically safe hash, but if
we're going to provide that option, it should be the right answer and
it's pretty clear, at least to me, that crc32c isn't a good choice for
gigabyte-size files.

> > Sure, there's a good chance we'll need newer algorithms in the future, I
> > don't doubt that.  On the other hand, if crc32c, or CRC whatever, was
> > the perfect answer and no one will ever need something better, then
> > what's with folks like Google suggesting something else..?
>
> I have never said that CRC was the perfect answer, and the reason why
> Google is suggesting something different is because they wanted a fast
> hash (not SHA) that still has cryptographic properties. What I have
> said is that using CRC-32C by default means that there is very little
> downside as compared with current releases. Backups will not get
> slower, and error detection will get better. If you pick any other
> default from the menu of options currently available, then either
> backups get noticeably slower, or we get less error detection
> capability than that option gives us.

I don't agree with limiting our view to only those algorithms that we've
already got implemented in PG.

> > As for folks who are that close to the edge on their backup timing that
> > they can't have it slow down- chances are pretty darn good that they're
> > not far from ending up needing to find a better solution than
> > pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
> > suppose, they could have one but not have checksums..).
>
> 40-50% is a lot more than "if you were on the edge."

We can agree to disagree on this, it's not particularly relevant in the
end.

> > > Well, that'd be wrong, though. It's true that backup_manifest won't
> > > have an entry in the manifest, and neither will WAL files, but
> > > postgresql.auto.conf will. We'll just skip complaining about it if the
> > > checksum doesn't match or whatever. The server generates manifest
> > > entries for everything, and the client decides not to pay attention to
> > > some of them because it knows that pg_basebackup may have made certain
> > > changes that were not known to the server.
> >
> > Ok, but it's also wrong to say that the backup_label is excluded from
> > verification.
>
> The docs don't say that backup_label is excluded from verification.
> They do say that backup_manifest is excluded from verification
> *against the manifest*, because it is. I'm not sure if you're honestly
> confused here or if we're just devolving into arguing for the sake of
> argument, but right now the code looks like this:

That you're bringing up code here is really just not sensible- we're
talking about the documentation, not about the code here.  I do
understand what the code is doing and I don't have any complaint about
the code.

> Oops. If you read that error carefully, you can see that the complaint
> is 100% valid. backup_manifest is indeed present on disk, but not in
> the manifest. However, because this situation is expected and known
> not to be a problem, the right thing to do is suppress the error. That
> is why it is in the ignore_list by default. The documentation is
> attempting to explain this. If it's unclear, we should try to make it
> better, but it is absolutely NOT saying that there is no internal
> validation of the backup_manifest. In fact, the previous paragraph
> tries to explain that:

Yes, I think the documentation is unclear, as I said before, because it
purports to list things that aren't being validated and then includes
backup_manifest in that list, which doesn't make sense.  The sentence in
question does *not* say "Certain files and directories are excluded from
the manifest" (which is wording that I actually proposed up-thread, to
try to address this...), it says, from the patch:

"Certain files and directories are excluded from verification:"

Excluded from verification.  Then lists backup_manifest.  Even though,
earlier in that same paragraph it says that the manifest is verified
against its own checksum.

> +   <application>pg_validatebackup</application> reads the manifest file of a
> +   backup, verifies the manifest against its own internal checksum, and then
>
> It is, however, saying, and *entirely correctly*, that
> pg_validatebackup will not check the backup_manifest file against the
> backup_manifest. If it did, it would find that it's not there. It
> would then emit an error message like the one above even though
> there's no problem with the backup.

It's saying, removing the listing aspect, exactly that "backup_label is
excluded from verification".  That's what I am taking issue with.  I've
made multiple attempts to suggest other language to avoid saying that
because it's clearly wrong- the manifest is verified.

> > I fail to see the usefulness of a tool that doesn't actually verify that
> > the backup is able to be restored from.
> >
> > Even pg_basebackup (in both fetch and stream modes...) checks that we at
> > least got all the WAL that's needed for the backup from the server
> > before considering the backup to be valid and telling the user that
> > there was a successful backup.  With what you're proposing here, we
> > could have someone do a pg_basebackup, get back an ERROR saying the
> > backup wasn't valid, and then run pg_validatebackup and be told that the
> > backup is valid.  I don't get how that's sensible.
>
> I'm sorry that you can't see how that's sensible, but it doesn't mean
> that it isn't sensible. It is totally unrealistic to expect that any
> backup verification tool can verify that you won't get an error when
> trying to use the backup. That would require that everything that the
> validation tool try to do everything that PostgreSQL will try to do
> when the backup is used, including running recovery and updating the
> data files. Anything less than that creates a real possibility that
> the backup will verify good but fail when used. This tool has a much
> narrower purpose, which is to try to verify that we (still) have the
> files the server sent as part of the backup and that, to the best of
> our ability to detect such things, they have not been modified. As you
> know, or should know, the WAL files are not sent as part of the
> backup, and so are not verified. Other things that would also be
> useful to check are also not verified. It would be fantastic to have
> more verification tools in the future, but it is difficult to see why
> anyone would bother trying if an attempt to get the first one
> committed gets blocked because it does not yet do everything. Very few
> patches try to do everything, and those that do usually get blocked
> because, by trying to do too much, they get some of it badly wrong.

I'm not talking about making sure that no error ever happens when doing
a restore of a particular backup.  You're arguing against something that
I have not advocated for and which I don't advocate for.

I'm saying that the existing tool that takes the backup has a *really*
*important* verification check that this proposed "validate backup" tool
doesn't have, and that isn't sensible.  It leads to situations where the
backup tool itself, pg_basebackup, can fail or be killed before it's
actually completed, and the "validate backup" tool would say that the
backup is perfectly fine.  That is not sensible.

That there might be other reasons why a backup can't be restored isn't
relevant and I'm not asking for a tool that is perfect and does some
kind of proof that the backup is able to be restored.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Mark Dilger
Date:

> On Mar 26, 2020, at 9:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> I'm not actually argueing about which hash functions we should support,
> but rather what the default is and if crc32c, specifically, is actually
> a reasonable choice.  Just because it's fast and we already had an
> implementation of it doesn't justify its use as the default.  Given that
> it doesn't actually provide the check that is generally expected of
> CRC checksums (100% detection of single-bit errors) when the file size
> gets over 512MB makes me wonder if we should have it at all, yes, but it
> definitely makes me think it shouldn't be our default.

I don't understand your focus on the single-bit error issue.  If you are sending your backup across the wire, single
biterrors during transmission should already be detected as part of the networking protocol.  The real issue has to be
detectionof the kinds of errors or modifications that are most likely to happen in practice.  Which are those?  People
manuallymucking with the files?  Bugs in backup scripts?  Corruption on the storage device?  Truncated files?  The more
bitsin the checksum (assuming a well designed checksum algorithm), the more likely we are to detect accidental
modification,so it is no surprise if a 64-bit crc does better than 32-bit crc.  But that logic can be taken arbitrarily
far. I don't see the connection between, on the one hand, an analysis of single-bit error detection against file size,
andon the other hand, the verification of backups. 

From a support perspective, I think the much more important issue is making certain that checksums are turned on.  A
onein a billion chance of missing an error seems pretty acceptable compared to the, let's say, one in two chance that
yourcustomer didn't use checksums.  Why are we even allowing this to be turned off?  Is there a usage case compelling
thatoption? 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: backup manifests

From
Robert Haas
Date:
On Thu, Mar 26, 2020 at 12:34 PM Stephen Frost <sfrost@snowman.net> wrote:
> I do agree with excluding things like md5 and others that aren't good
> options.  I wasn't saying we should necessarily exclude crc32c either..
> but rather saying that it shouldn't be the default.
>
> Here's another way to look at it- where do we use crc32c today, and how
> much data might we possibly be covering with that crc?

WAL record size is a 32-bit unsigned integer, so in theory, up to 4GB
minus 1 byte. In practice, most of them are not more than a few
hundred bytes, the amount we might possibly be covering is a lot more.

> Why was crc32c
> picked for that purpose?

Because it was discovered that 64-bit CRC was too slow, per commit
21fda22ec46deb7734f793ef4d7fa6c226b4c78e.

> If the individual who decided to pick crc32c
> for that case was contemplating a checksum for up-to-1GB files, would
> they have picked crc32c?  Seems unlikely to me.

It's hard to be sure what someone who isn't us would have done in some
situation that they didn't face, but we do have the discussion thread:

https://www.postgresql.org/message-id/flat/9291.1117593389%40sss.pgh.pa.us#c4e413bbf3d7fbeced7786da1c3aca9c

The question of how much data is protected by the CRC was discussed,
mostly in the first few messages, in general terms, but it doesn't
seem to have covered the question very thoroughly. I'm sure we could
each draw things from that discussion that support our view of the
situation, but I'm not sure it would be very productive.

What confuses to me is that you seem to have a view of the upsides and
downsides of these various algorithms that seems to me to be highly
skewed. Like, suppose we change the default from CRC-32C to
SHA-something. On the upside, the error detection rate will increase
from 99.9999999+% to something much closer to 100%. On the downside,
backups will get as much as 40-50% slower for some users. I hope we
can agree that both detecting errors and taking backups quickly are
important. However, it is hard for me to imagine that the typical user
would want to pay even a 5-10% performance penalty when taking a
backup in order to improve an error detection feature which they may
not even use and which already has less than a one-in-a-billion chance
of going wrong. We routinely reject features for causing, say, a 2%
regression on general workloads. Base backup speed is probably less
important than how many SELECT or INSERT queries you can pump through
the system in a second, but it's still a pain point for lots of
people. I think if you said to some users "hey, would you like to have
error detection for your backups? it'll cost 10%" many people would
say "yes, please." But I think if you went to the same users and said
"hey, would you like to make the error detection for your backups
better? it currently has a less than 1-in-a-billion chance of failing
to detect random corruption, and you can reduce that by many orders of
magnitude for an extra 10% on your backup time," I think the results
would be much more mixed. Some people would like it, but it certainly
not everybody.

> I'm not actually argueing about which hash functions we should support,
> but rather what the default is and if crc32c, specifically, is actually
> a reasonable choice.  Just because it's fast and we already had an
> implementation of it doesn't justify its use as the default.  Given that
> it doesn't actually provide the check that is generally expected of
> CRC checksums (100% detection of single-bit errors) when the file size
> gets over 512MB makes me wonder if we should have it at all, yes, but it
> definitely makes me think it shouldn't be our default.

I mean, the property that I care about is the one where it detects
better than 999,999,999 errors out of every 1,000,000,000, regardless
of input length.

> I don't agree with limiting our view to only those algorithms that we've
> already got implemented in PG.

I mean, opening that giant can of worms ~2 weeks before feature freeze
is not very nice. This patch has been around for months, and the
algorithms were openly discussed a long time ago. I checked and found
out that the CRC-64 code was nuked in commit
404bc51cde9dce1c674abe4695635612f08fe27e, so in theory we could revert
that, but how much confidence do we have that the code in question
actually did the right thing, or that it's actually fast? An awful lot
of work has been done on the CRC-32C code over the years, including
several rounds of speeding it up
(f044d71e331d77a0039cec0a11859b5a3c72bc95,
3dc2d62d0486325bf263655c2d9a96aee0b02abe) and one round of fixing it
because it was producing completely wrong answers
(5028f22f6eb0579890689655285a4778b4ffc460), so I don't have a lot of
confidence about that CRC-64 code being totally without problems.

The commit message for that last commit,
5028f22f6eb0579890689655285a4778b4ffc460, seems pretty relevant in
this context, too. It observes that, because it "does not correspond
to any bit-wise CRC calculation" it is "difficult to reason about its
properties." In other words, the algorithm that we used for WAL
records for many years likely did not have the guaranteed
error-detection properties with which you are so concerned (nor do
most hash functions we might choose; CRC-64 is probably the only
choice that would). Despite that, the commit message also observed
that "it has worked well in practice." I realize I'm not convincing
you of anything here, but the guaranteed error-detection properties of
CRC are almost totally uninteresting in this context. I'm not
concerned that CRC-32C doesn't have those properties. I'm not
concerned that SHA-n wouldn't have those properties. I'm not concerned
that xxhash or HighwayHash don't have that property either. I doubt
the fact that CRC-64 would have that property would give us much
benefit. I think the only things that matter here are (1) how many
bits you get (more bits = better chance of finding errors, but even
*sixteen* bits would give you a pretty fair chance of noticing if
things are broken) and (2) whether you want a cryptographic hash
function so that you can keep the backup manifest in a vault.

> It's saying, removing the listing aspect, exactly that "backup_label is
> excluded from verification".  That's what I am taking issue with.  I've
> made multiple attempts to suggest other language to avoid saying that
> because it's clearly wrong- the manifest is verified.

Well, it's talking about the particular kind of verification that has
just been discussed, not any form of verification. As one idea,
perhaps instead of:

+ Certain files and directories are
+   excluded from verification:

...I could maybe insert a paragraph break there and then continue with
something like this:

When pg_basebackup compares the files and directories in the manifest
to those which are present on disk, it will ignore the presence of, or
changes to, certain files:

backup_manifest will not be present in the manifest itself, and is
therefore ignored. Note that the manifest is still verified
internally, as described above, but no error will be issued about the
presence of a backup_manifest file in the backup directory even though
it is not listed in the manifest.

Would that be more clear? Do you want to suggest something else?

> I'm not talking about making sure that no error ever happens when doing
> I'm saying that the existing tool that takes the backup has a *really*
> *important* verification check that this proposed "validate backup" tool
> doesn't have, and that isn't sensible.  It leads to situations where the
> backup tool itself, pg_basebackup, can fail or be killed before it's
> actually completed, and the "validate backup" tool would say that the
> backup is perfectly fine.  That is not sensible.

If someone's procedure for taking and restoring backups involves not
knowing whether or not pg_basebackup completed without error and then
trying to use the backup anyway, they are doing something which is
very foolish, and it's questionable whether any technological solution
has much hope of getting them out of trouble. But on the plus side,
this patch would have a good chance of detecting the problem, which is
a noticeable improvement over what we have now, which has no chance of
detecting the problem, because we have nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Mark Dilger (mark.dilger@enterprisedb.com) wrote:
> > On Mar 26, 2020, at 9:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > I'm not actually argueing about which hash functions we should support,
> > but rather what the default is and if crc32c, specifically, is actually
> > a reasonable choice.  Just because it's fast and we already had an
> > implementation of it doesn't justify its use as the default.  Given that
> > it doesn't actually provide the check that is generally expected of
> > CRC checksums (100% detection of single-bit errors) when the file size
> > gets over 512MB makes me wonder if we should have it at all, yes, but it
> > definitely makes me think it shouldn't be our default.
>
> I don't understand your focus on the single-bit error issue.

Maybe I'm wrong, but my understanding was that detecting single-bit
errors was one of the primary design goals of CRC and why people talk
about CRCs of certain sizes having 'limits'- that's the size at which
single-bit errors will no longer, necessarily, be picked up and
therefore that's where the CRC of that size starts falling down on that
goal.

> If you are sending your backup across the wire, single bit errors during transmission should already be detected as
partof the networking protocol.  The real issue has to be detection of the kinds of errors or modifications that are
mostlikely to happen in practice.  Which are those?  People manually mucking with the files?  Bugs in backup scripts?
Corruptionon the storage device?  Truncated files?  The more bits in the checksum (assuming a well designed checksum
algorithm),the more likely we are to detect accidental modification, so it is no surprise if a 64-bit crc does better
than32-bit crc.  But that logic can be taken arbitrarily far.  I don't see the connection between, on the one hand, an
analysisof single-bit error detection against file size, and on the other hand, the verification of backups. 

We'd like something that does a good job at detecting any differences
between when the file was copied off of the server and when the command
is run- potentially weeks or months later.  I would expect most issues
to end up being storage-level corruption over time where the backup is
stored, which could be single bit flips or whole pages getting zeroed or
various other things.  Files changing size probably is one of the less
common things, but, sure, that too.

That we could take this "arbitrarily far" is actually entirely fine-
that's a good reason to have alternatives, which this patch does have,
but that doesn't mean we should have a default that's not suitable for
the files that we know we're going to be storing.

Consider that we could have used a 16-bit CRC instead, but does that
actually make sense?  Ok, sure, maybe someone really wants something
super fast- but should that be our default?  If not, then what criteria
should we use for the default?

> From a support perspective, I think the much more important issue is making certain that checksums are turned on.  A
onein a billion chance of missing an error seems pretty acceptable compared to the, let's say, one in two chance that
yourcustomer didn't use checksums.  Why are we even allowing this to be turned off?  Is there a usage case compelling
thatoption? 

The argument is that adding checksums takes more time.  I can understand
that argument, though I don't really agree with it.  Certainly a few
percent really shouldn't be that big of an issue, and in many cases even
a sha256 hash isn't going to have that dramatic of an impact on the
actual overall time.

Thanks,

Stephen

Attachment

Re: backup manifests

From
David Steele
Date:
On 3/26/20 11:37 AM, Robert Haas wrote:
>> On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost@snowman.net> wrot >
> This is where I feel like I'm trying to make decisions in a vacuum. If
> we had a few more people weighing in on the thread on this point, I'd
> be happy to go with whatever the consensus was. If most people think
> having both --no-manifest (suppressing the manifest completely) and
> --manifest-checksums=none (suppressing only the checksums) is useless
> and confusing, then sure, let's rip the latter one out. If most people
> like the flexibility, let's keep it: it's already implemented and
> tested. But I hate to base the decision on what one or two people
> think.

I'm not sure I see a lot of value to being able to build manifest with 
no checksums, especially if overhead for the default checksum algorithm 
is negligible.

However, I'd still prefer that the default be something more robust and 
allow users to tune it down rather than the other way around.  But I've 
made that pretty clear up-thread and I consider that argument lost at 
this point.

>> As for folks who are that close to the edge on their backup timing that
>> they can't have it slow down- chances are pretty darn good that they're
>> not far from ending up needing to find a better solution than
>> pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
>> suppose, they could have one but not have checksums..).
> 
> 40-50% is a lot more than "if you were on the edge."

For the record I think this is a very misleading number.  Sure, if you 
are doing your backup to a local SSD on a powerful development laptop it 
makes sense.

But backups are generally placed on slower storage, remotely, with 
compression.  Even without compression the first two are going to bring 
this percentage down by a lot.

When you get to page-level incremental backups, which is where this all 
started, I'd still recommend using a stronger checksum algorithm to 
verify that the file was reconstructed correctly on restore.  That much 
I believe we have agreed on.

>> Even pg_basebackup (in both fetch and stream modes...) checks that we at
>> least got all the WAL that's needed for the backup from the server
>> before considering the backup to be valid and telling the user that
>> there was a successful backup.  With what you're proposing here, we
>> could have someone do a pg_basebackup, get back an ERROR saying the
>> backup wasn't valid, and then run pg_validatebackup and be told that the
>> backup is valid.  I don't get how that's sensible.
> 
> I'm sorry that you can't see how that's sensible, but it doesn't mean
> that it isn't sensible. It is totally unrealistic to expect that any
> backup verification tool can verify that you won't get an error when
> trying to use the backup. That would require that everything that the
> validation tool try to do everything that PostgreSQL will try to do
> when the backup is used, including running recovery and updating the
> data files. Anything less than that creates a real possibility that
> the backup will verify good but fail when used. This tool has a much
> narrower purpose, which is to try to verify that we (still) have the
> files the server sent as part of the backup and that, to the best of
> our ability to detect such things, they have not been modified. As you
> know, or should know, the WAL files are not sent as part of the
> backup, and so are not verified. Other things that would also be
> useful to check are also not verified. It would be fantastic to have
> more verification tools in the future, but it is difficult to see why
> anyone would bother trying if an attempt to get the first one
> committed gets blocked because it does not yet do everything. Very few
> patches try to do everything, and those that do usually get blocked
> because, by trying to do too much, they get some of it badly wrong.

I agree with Stephen that this should be done, but I agree with you that 
it can wait for a future commit. However, I do think:

1) It should be called out rather plainly in the documentation.
2) If there are files in pg_wal then pg_validatebackup should inform the 
user that those files have not been validated.

I know you and Stephen have agreed on a number of doc changes, would it 
be possible to get a new patch with those included? I finally have time 
to do a review of this tomorrow.  I saw some mistakes in the docs in the 
current patch but I know those patches are not current.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Mark Dilger
Date:

> On Mar 26, 2020, at 12:37 PM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Mark Dilger (mark.dilger@enterprisedb.com) wrote:
>>> On Mar 26, 2020, at 9:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
>>> I'm not actually argueing about which hash functions we should support,
>>> but rather what the default is and if crc32c, specifically, is actually
>>> a reasonable choice.  Just because it's fast and we already had an
>>> implementation of it doesn't justify its use as the default.  Given that
>>> it doesn't actually provide the check that is generally expected of
>>> CRC checksums (100% detection of single-bit errors) when the file size
>>> gets over 512MB makes me wonder if we should have it at all, yes, but it
>>> definitely makes me think it shouldn't be our default.
>>
>> I don't understand your focus on the single-bit error issue.
>
> Maybe I'm wrong, but my understanding was that detecting single-bit
> errors was one of the primary design goals of CRC and why people talk
> about CRCs of certain sizes having 'limits'- that's the size at which
> single-bit errors will no longer, necessarily, be picked up and
> therefore that's where the CRC of that size starts falling down on that
> goal.

I think I agree with all that.  I'm not sure it is relevant.  When people use CRCs to detect things *other than*
transmissionerrors, they are in some sense using a hammer to drive a screw.  At that point, the analysis of how good
thehammer is, and how big a nail it can drive, is no longer relevant.  The relevant discussion here is how appropriate
aCRC is for our purpose.  I don't know the answer to that, but it doesn't seem the single-bit error analysis is the
rightanalysis. 

>> If you are sending your backup across the wire, single bit errors during transmission should already be detected as
partof the networking protocol.  The real issue has to be detection of the kinds of errors or modifications that are
mostlikely to happen in practice.  Which are those?  People manually mucking with the files?  Bugs in backup scripts?
Corruptionon the storage device?  Truncated files?  The more bits in the checksum (assuming a well designed checksum
algorithm),the more likely we are to detect accidental modification, so it is no surprise if a 64-bit crc does better
than32-bit crc.  But that logic can be taken arbitrarily far.  I don't see the connection between, on the one hand, an
analysisof single-bit error detection against file size, and on the other hand, the verification of backups. 
>
> We'd like something that does a good job at detecting any differences
> between when the file was copied off of the server and when the command
> is run- potentially weeks or months later.  I would expect most issues
> to end up being storage-level corruption over time where the backup is
> stored, which could be single bit flips or whole pages getting zeroed or
> various other things.  Files changing size probably is one of the less
> common things, but, sure, that too.
>
> That we could take this "arbitrarily far" is actually entirely fine-
> that's a good reason to have alternatives, which this patch does have,
> but that doesn't mean we should have a default that's not suitable for
> the files that we know we're going to be storing.
>
> Consider that we could have used a 16-bit CRC instead, but does that
> actually make sense?  Ok, sure, maybe someone really wants something
> super fast- but should that be our default?  If not, then what criteria
> should we use for the default?

I'll answer this below....

>> From a support perspective, I think the much more important issue is making certain that checksums are turned on.  A
onein a billion chance of missing an error seems pretty acceptable compared to the, let's say, one in two chance that
yourcustomer didn't use checksums.  Why are we even allowing this to be turned off?  Is there a usage case compelling
thatoption? 
>
> The argument is that adding checksums takes more time.  I can understand
> that argument, though I don't really agree with it.  Certainly a few
> percent really shouldn't be that big of an issue, and in many cases even
> a sha256 hash isn't going to have that dramatic of an impact on the
> actual overall time.

I see two dangers here:

(1) The user enables checksums of some type, and due to checksums not being perfect, corruption happens but goes
undetected,leaving her in a bad place. 

(2) The user makes no checksum selection at all, gets checksums of the *default* type, determines it is too slow for
herpurposes, and instead of adjusting the checksum algorithm to something faster, simply turns checksums off;
corruptionhappens and of course is undetected, leaving her in a bad place. 

I think the risk of (2) is far worse, which makes me tend towards a default that is fast enough not to encourage
anybodyto disable checksums altogether.  I have no opinion about which algorithm is best suited to that purpose,
becauseI haven't benchmarked any.  I'm pretty much going off what Robert said, in terms of how big an impact using a
heavieralgorithm would be.  Perhaps you'd like to run benchmarks and make a concrete proposal for another algorithm,
withnumbers showing the runtime changes?  You mentioned up-thread that prior timings which showed a 40-50% slowdown
werenot including all the relevant stuff, so perhaps you could fix that in your benchmark and let us know what is
includedin the timings? 

I don't think we should be contemplating for v13 any checksum algorithms for the default except the ones already in the
optionslist.  Doing that just derails the patch.  If you want highwayhash or similar to be the default, can't we hold
offuntil v14 and think about changing the default?  Maybe I'm missing something, but I don't see any reason why it
wouldbe hard to change this after the first version has already been released. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Mar 26, 2020 at 12:34 PM Stephen Frost <sfrost@snowman.net> wrote:
> > I do agree with excluding things like md5 and others that aren't good
> > options.  I wasn't saying we should necessarily exclude crc32c either..
> > but rather saying that it shouldn't be the default.
> >
> > Here's another way to look at it- where do we use crc32c today, and how
> > much data might we possibly be covering with that crc?
>
> WAL record size is a 32-bit unsigned integer, so in theory, up to 4GB
> minus 1 byte. In practice, most of them are not more than a few
> hundred bytes, the amount we might possibly be covering is a lot more.

Is it actually possible, today, in PG, to have a 4GB WAL record?
Judging this based on the WAL record size doesn't seem quite right.

> > Why was crc32c
> > picked for that purpose?
>
> Because it was discovered that 64-bit CRC was too slow, per commit
> 21fda22ec46deb7734f793ef4d7fa6c226b4c78e.

... 15 years ago.  I actually find it pretty interesting that we started
out with a 64bit CRC there, I didn't know that was the case.  Also
interesting is that we had 64bit CRC code already.

> > If the individual who decided to pick crc32c
> > for that case was contemplating a checksum for up-to-1GB files, would
> > they have picked crc32c?  Seems unlikely to me.
>
> It's hard to be sure what someone who isn't us would have done in some
> situation that they didn't face, but we do have the discussion thread:
>
> https://www.postgresql.org/message-id/flat/9291.1117593389%40sss.pgh.pa.us#c4e413bbf3d7fbeced7786da1c3aca9c
>
> The question of how much data is protected by the CRC was discussed,
> mostly in the first few messages, in general terms, but it doesn't
> seem to have covered the question very thoroughly. I'm sure we could
> each draw things from that discussion that support our view of the
> situation, but I'm not sure it would be very productive.

Interesting.

> What confuses to me is that you seem to have a view of the upsides and
> downsides of these various algorithms that seems to me to be highly
> skewed. Like, suppose we change the default from CRC-32C to
> SHA-something. On the upside, the error detection rate will increase
> from 99.9999999+% to something much closer to 100%. On the downside,
> backups will get as much as 40-50% slower for some users. I hope we
> can agree that both detecting errors and taking backups quickly are
> important. However, it is hard for me to imagine that the typical user
> would want to pay even a 5-10% performance penalty when taking a
> backup in order to improve an error detection feature which they may
> not even use and which already has less than a one-in-a-billion chance
> of going wrong. We routinely reject features for causing, say, a 2%
> regression on general workloads. Base backup speed is probably less
> important than how many SELECT or INSERT queries you can pump through
> the system in a second, but it's still a pain point for lots of
> people. I think if you said to some users "hey, would you like to have
> error detection for your backups? it'll cost 10%" many people would
> say "yes, please." But I think if you went to the same users and said
> "hey, would you like to make the error detection for your backups
> better? it currently has a less than 1-in-a-billion chance of failing
> to detect random corruption, and you can reduce that by many orders of
> magnitude for an extra 10% on your backup time," I think the results
> would be much more mixed. Some people would like it, but it certainly
> not everybody.

I think you're right that base backup speed is much less of an issue to
slow down than SELECT or INSERT workloads, but I do also understand
that it isn't completely unimportant, which is why having options isn't
a bad idea here.  That said, the options presented for users should all
be reasonable options, and for the default we should pick something
sensible, erroring on the "be safer" side, if anything.

There's lots of options for speeding up base backups, with this patch,
even if the default is to have a manifest with sha256 hashes- it could
be changed to some form of CRC, or changed to not have checksums, or
changed to not have a manifest.  Users will have options.

Again, I'm not against having a checksum algorithm as a option.  I'm not
saying that it must be SHA512 as the default.

> > I'm not actually argueing about which hash functions we should support,
> > but rather what the default is and if crc32c, specifically, is actually
> > a reasonable choice.  Just because it's fast and we already had an
> > implementation of it doesn't justify its use as the default.  Given that
> > it doesn't actually provide the check that is generally expected of
> > CRC checksums (100% detection of single-bit errors) when the file size
> > gets over 512MB makes me wonder if we should have it at all, yes, but it
> > definitely makes me think it shouldn't be our default.
>
> I mean, the property that I care about is the one where it detects
> better than 999,999,999 errors out of every 1,000,000,000, regardless
> of input length.

Throwing these kinds of things around I really don't think is useful.

> > I don't agree with limiting our view to only those algorithms that we've
> > already got implemented in PG.
>
> I mean, opening that giant can of worms ~2 weeks before feature freeze
> is not very nice. This patch has been around for months, and the
> algorithms were openly discussed a long time ago.

Yes, they were discussed before, and these issues were brought up before
and there was specifically concern brought up about exactly the same
issues that I'm repeating here.  Those concerns seem to have been
largely ignored, apparently because "we don't have that in PG today" as
at least one of the considerations- even though we used to.  I don't
think that was the right response and, yeah, I saw that you were
planning to commit and that prompted me to look into it right now.  I
don't think that's entirely uncommon around here.  I also had hoped that
David's concerns that were raised before had been heeded, as I knew he
was involved in the discussion previously, but that turns out to not
have been the case.

> > It's saying, removing the listing aspect, exactly that "backup_label is
> > excluded from verification".  That's what I am taking issue with.  I've
> > made multiple attempts to suggest other language to avoid saying that
> > because it's clearly wrong- the manifest is verified.
>
> Well, it's talking about the particular kind of verification that has
> just been discussed, not any form of verification. As one idea,
> perhaps instead of:
>
> + Certain files and directories are
> +   excluded from verification:
>
> ...I could maybe insert a paragraph break there and then continue with
> something like this:
>
> When pg_basebackup compares the files and directories in the manifest
> to those which are present on disk, it will ignore the presence of, or
> changes to, certain files:
>
> backup_manifest will not be present in the manifest itself, and is
> therefore ignored. Note that the manifest is still verified
> internally, as described above, but no error will be issued about the
> presence of a backup_manifest file in the backup directory even though
> it is not listed in the manifest.
>
> Would that be more clear? Do you want to suggest something else?

Yes, that looks fine.  Feels slightly redundant to include the "as
described above ..." bit, and I think that could be dropped, but up to
you.

> > I'm not talking about making sure that no error ever happens when doing
> > I'm saying that the existing tool that takes the backup has a *really*
> > *important* verification check that this proposed "validate backup" tool
> > doesn't have, and that isn't sensible.  It leads to situations where the
> > backup tool itself, pg_basebackup, can fail or be killed before it's
> > actually completed, and the "validate backup" tool would say that the
> > backup is perfectly fine.  That is not sensible.
>
> If someone's procedure for taking and restoring backups involves not
> knowing whether or not pg_basebackup completed without error and then
> trying to use the backup anyway, they are doing something which is
> very foolish, and it's questionable whether any technological solution
> has much hope of getting them out of trouble. But on the plus side,
> this patch would have a good chance of detecting the problem, which is
> a noticeable improvement over what we have now, which has no chance of
> detecting the problem, because we have nothing.

This doesn't address my concern at all.  Even if it seems ridiculous and
foolish to think that a backup was successful when the system was
rebooted and pg_basebackup was killed before all of the WAL had made it
into pg_wal, there is absolutely zero doubt in my mind that it's going
to happen and users are going to, entirely reasonably, think that
pg_validatebackup at least includes all the checks that pg_basebackup
does about making sure that the backup is valid.

I really don't understand how we can have a backup validation tool that
doesn't do the absolute basics, like making sure that we have all of the
WAL for the backup.  I've routinely, almost jokingly, said to folks that
any backup tool that doesn't check that isn't really a backup tool, and
I was glad that pg_basebackup had that check, so, yeah, I'm going to
continue to object to committing a backup validation tool that doesn't
have that absolutely basic and necessary check.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Mark Dilger (mark.dilger@enterprisedb.com) wrote:
> > On Mar 26, 2020, at 12:37 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Mark Dilger (mark.dilger@enterprisedb.com) wrote:
> >>> On Mar 26, 2020, at 9:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
> >>> I'm not actually argueing about which hash functions we should support,
> >>> but rather what the default is and if crc32c, specifically, is actually
> >>> a reasonable choice.  Just because it's fast and we already had an
> >>> implementation of it doesn't justify its use as the default.  Given that
> >>> it doesn't actually provide the check that is generally expected of
> >>> CRC checksums (100% detection of single-bit errors) when the file size
> >>> gets over 512MB makes me wonder if we should have it at all, yes, but it
> >>> definitely makes me think it shouldn't be our default.
> >>
> >> I don't understand your focus on the single-bit error issue.
> >
> > Maybe I'm wrong, but my understanding was that detecting single-bit
> > errors was one of the primary design goals of CRC and why people talk
> > about CRCs of certain sizes having 'limits'- that's the size at which
> > single-bit errors will no longer, necessarily, be picked up and
> > therefore that's where the CRC of that size starts falling down on that
> > goal.
>
> I think I agree with all that.  I'm not sure it is relevant.  When people use CRCs to detect things *other than*
transmissionerrors, they are in some sense using a hammer to drive a screw.  At that point, the analysis of how good
thehammer is, and how big a nail it can drive, is no longer relevant.  The relevant discussion here is how appropriate
aCRC is for our purpose.  I don't know the answer to that, but it doesn't seem the single-bit error analysis is the
rightanalysis. 

I disagree that it's not relevant- it's, in fact, the one really clear
thing we can get a pretty straight-forward answer on, and that seems
really useful to me.

> >> If you are sending your backup across the wire, single bit errors during transmission should already be detected
aspart of the networking protocol.  The real issue has to be detection of the kinds of errors or modifications that are
mostlikely to happen in practice.  Which are those?  People manually mucking with the files?  Bugs in backup scripts?
Corruptionon the storage device?  Truncated files?  The more bits in the checksum (assuming a well designed checksum
algorithm),the more likely we are to detect accidental modification, so it is no surprise if a 64-bit crc does better
than32-bit crc.  But that logic can be taken arbitrarily far.  I don't see the connection between, on the one hand, an
analysisof single-bit error detection against file size, and on the other hand, the verification of backups. 
> >
> > We'd like something that does a good job at detecting any differences
> > between when the file was copied off of the server and when the command
> > is run- potentially weeks or months later.  I would expect most issues
> > to end up being storage-level corruption over time where the backup is
> > stored, which could be single bit flips or whole pages getting zeroed or
> > various other things.  Files changing size probably is one of the less
> > common things, but, sure, that too.
> >
> > That we could take this "arbitrarily far" is actually entirely fine-
> > that's a good reason to have alternatives, which this patch does have,
> > but that doesn't mean we should have a default that's not suitable for
> > the files that we know we're going to be storing.
> >
> > Consider that we could have used a 16-bit CRC instead, but does that
> > actually make sense?  Ok, sure, maybe someone really wants something
> > super fast- but should that be our default?  If not, then what criteria
> > should we use for the default?
>
> I'll answer this below....
>
> >> From a support perspective, I think the much more important issue is making certain that checksums are turned on.
Aone in a billion chance of missing an error seems pretty acceptable compared to the, let's say, one in two chance that
yourcustomer didn't use checksums.  Why are we even allowing this to be turned off?  Is there a usage case compelling
thatoption? 
> >
> > The argument is that adding checksums takes more time.  I can understand
> > that argument, though I don't really agree with it.  Certainly a few
> > percent really shouldn't be that big of an issue, and in many cases even
> > a sha256 hash isn't going to have that dramatic of an impact on the
> > actual overall time.
>
> I see two dangers here:
>
> (1) The user enables checksums of some type, and due to checksums not being perfect, corruption happens but goes
undetected,leaving her in a bad place. 
>
> (2) The user makes no checksum selection at all, gets checksums of the *default* type, determines it is too slow for
herpurposes, and instead of adjusting the checksum algorithm to something faster, simply turns checksums off;
corruptionhappens and of course is undetected, leaving her in a bad place. 

Alright, I have tried to avoid referring back to pgbackrest, but I can't
help it here.

We have never, ever, had a user come to us and complain that pgbackrest
is too slow because we're using a SHA hash.  We have also had them by
default since absolutely day number one, and we even removed the option
to disable them in 1.0.  We've never even been asked if we should
implement some other hash or checksum which is faster.

> I think the risk of (2) is far worse, which makes me tend towards a default that is fast enough not to encourage
anybodyto disable checksums altogether.  I have no opinion about which algorithm is best suited to that purpose,
becauseI haven't benchmarked any.  I'm pretty much going off what Robert said, in terms of how big an impact using a
heavieralgorithm would be.  Perhaps you'd like to run benchmarks and make a concrete proposal for another algorithm,
withnumbers showing the runtime changes?  You mentioned up-thread that prior timings which showed a 40-50% slowdown
werenot including all the relevant stuff, so perhaps you could fix that in your benchmark and let us know what is
includedin the timings? 

I don't even know what the 40-50% slowdown numbers included.  Also, the
general expectation in this community is that whomever is pushing a
given patch forward should be providing the benchmarks to justify their
choice.

> I don't think we should be contemplating for v13 any checksum algorithms for the default except the ones already in
theoptions list.  Doing that just derails the patch.  If you want highwayhash or similar to be the default, can't we
holdoff until v14 and think about changing the default?  Maybe I'm missing something, but I don't see any reason why it
wouldbe hard to change this after the first version has already been released. 

I'd rather we default to something that we are all confident and happy
with, erroring on the side of it being overkill rather than something
that we know isn't really appropriate for the data volume.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-26 11:37:48 -0400, Robert Haas wrote:
> I mean, you're just repeating the same argument here, and it's just
> not valid. Regardless of the file size, the chances of a false
> checksum match are literally less than one in a billion. There is
> every reason to believe that users will be happy with a low-overhead
> method that has a 99.9999999+% chance of detecting corrupt files. I do
> agree that a 64-bit CRC would probably be not much more expensive and
> improve the probability of detecting errors even further

I *seriously* doubt that it's true that 64bit CRCs wouldn't be
slower. The only reason CRC32C is semi-fast is that we're accelerating
it using hardware instructions (on x86-64 and ARM at least). Before that
it was very regularly the bottleneck for processing WAL - and it still
sometimes is. Most CRCs aren't actually very fast to compute, because
they don't lend themselves to benefit from ILP or SIMD.  We spent a fair
bit of time optimizing our crc implementation before the hardware
support was widespread.


> but I wanted to restrict this patch to using infrastructure we already
> have. The choices there are the various SHA functions (so I supported
> those), MD5 (which I deliberately omitted, for reasons I hope you'll
> be the first to agree with), CRC-32C (which is fast), a couple of
> other CRC-32 variants (which I omitted because they seemed redundant
> and one of them only ever existed in PostgreSQL because of a coding
> mistake), and the hacked-up version of FNV that we use for page-level
> checksums (which is only 16 bits and seems to have no advantages for
> this purpose).

FWIW, FNV is only 16bit because we reduce its size to 16 bit. See the
tail of pg_checksum_page.


I'm not sure the error detection guarantees of various CRC algorithms
are that relevant here, btw. IMO, for something like checksums in a
backup, just having a single one-bit error isn't as common as having
larger errors (e.g. entire blocks beeing zeroed). And to detect that
32bit checksums aren't that good.


> > As for folks who are that close to the edge on their backup timing that
> > they can't have it slow down- chances are pretty darn good that they're
> > not far from ending up needing to find a better solution than
> > pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
> > suppose, they could have one but not have checksums..).
> 
> 40-50% is a lot more than "if you were on the edge."

sha256 does about approx 400MB/s per core on modern intel CPUs. That's
way below commonly accessible storage / network capabilities (and even
if you're only doing 200MB/s, you're still going to spend roughly half
of the CPU time just doing hashing.  It's unlikely that you're going to
see much speedups for sha256 just by upgrading a CPU. While there are
hardware instructions available, they don't result in all that large
improvements. Of course, we could also start using the GPU (err, really
no).

Defaulting to that makes very little sense to me. You're not just going
to spend that time while backing up, but also when validating backups
(i.e. network limits suddenly aren't a relevant bottleneck anymore).


> > I fail to see the usefulness of a tool that doesn't actually verify that
> > the backup is able to be restored from.
> >
> > Even pg_basebackup (in both fetch and stream modes...) checks that we at
> > least got all the WAL that's needed for the backup from the server
> > before considering the backup to be valid and telling the user that
> > there was a successful backup.  With what you're proposing here, we
> > could have someone do a pg_basebackup, get back an ERROR saying the
> > backup wasn't valid, and then run pg_validatebackup and be told that the
> > backup is valid.  I don't get how that's sensible.
> 
> I'm sorry that you can't see how that's sensible, but it doesn't mean
> that it isn't sensible. It is totally unrealistic to expect that any
> backup verification tool can verify that you won't get an error when
> trying to use the backup. That would require that everything that the
> validation tool try to do everything that PostgreSQL will try to do
> when the backup is used, including running recovery and updating the
> data files. Anything less than that creates a real possibility that
> the backup will verify good but fail when used. This tool has a much
> narrower purpose, which is to try to verify that we (still) have the
> files the server sent as part of the backup and that, to the best of
> our ability to detect such things, they have not been modified. As you
> know, or should know, the WAL files are not sent as part of the
> backup, and so are not verified. Other things that would also be
> useful to check are also not verified. It would be fantastic to have
> more verification tools in the future, but it is difficult to see why
> anyone would bother trying if an attempt to get the first one
> committed gets blocked because it does not yet do everything. Very few
> patches try to do everything, and those that do usually get blocked
> because, by trying to do too much, they get some of it badly wrong.

It sounds to me that if there are to be manifests for the WAL, it should
be a separate (set of) manifests. Trying to somehow tie together the
manifest for the base backup, and the one for the WAL, makes little
sense to me. They're commonly not computed in one place, often not even
stored in the same place. For PITR relevant WAL doesn't even exist yet
at the time the manifest is created (and thus obviously cannot be
included in the base backup manifest). And fairly obviously one would
want to be able to verify the correctness of WAL between two
basebackups.

I don't see much point in complicating the design to somehow capture WAL
in the manifest, when it's only going to solve a small set of cases.

Seems better to (later?) add support for generating manifests for WAL
files, and then have a tool that can verify all the manifests required
to restore a base backup.

Greetings,

Andres Freund



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-26 14:02:29 -0400, Robert Haas wrote:
> On Thu, Mar 26, 2020 at 12:34 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Why was crc32c
> > picked for that purpose?
> 
> Because it was discovered that 64-bit CRC was too slow, per commit
> 21fda22ec46deb7734f793ef4d7fa6c226b4c78e.

Well, a 32bit crc, not crc32c. IIRC it was the ethernet polynomial (+
bug). We switched to crc32c at some point because there are hardware
implementations:

commit 5028f22f6eb0579890689655285a4778b4ffc460
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   2014-11-04 11:35:15 +0200

    Switch to CRC-32C in WAL and other places.


> Like, suppose we change the default from CRC-32C to SHA-something. On
> the upside, the error detection rate will increase from 99.9999999+%
> to something much closer to 100%.

FWIW, I don't buy the relevancy of 99.9999999+% at all. That's assuming
a single bit error (at relevant lengths, before that it's single burst
errors of a greater length), which isn't that relevant for our purposes.

That's not to say that I don't think a CRC check can provide value. It
does provide a high likelihood of detecting enough errors, including
coding errors in how data is restored (not unimportant), that you're
likely not find out aobut a problem soon.


> On the downside,
> backups will get as much as 40-50% slower for some users. I hope we
> can agree that both detecting errors and taking backups quickly are
> important. However, it is hard for me to imagine that the typical user
> would want to pay even a 5-10% performance penalty when taking a
> backup in order to improve an error detection feature which they may
> not even use and which already has less than a one-in-a-billion chance
> of going wrong.

FWIW, that seems far too large a slowdown to default to for me. Most
people aren't going to be able to figure out that it's the checksum
parameter that causes this slowdown, there just going to feel the pain
of the backup being much slower than their hardware.

A few hundred megabytes of streaming reads/writes really doesn't take a
beefy server these days. Medium sized VMs + a bit larger network block
devices at all the common cloud providers have considerably higher
bandwidth. Even a raid5x of 4 spinning disks can deliver > 500MB/s.

And plenty of even the smaller instances at many providers have >
5gbit/s network. At the upper end it's way more than that.

Greetings,

Andres Freund



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-26 15:37:11 -0400, Stephen Frost wrote:
> The argument is that adding checksums takes more time.  I can understand
> that argument, though I don't really agree with it.  Certainly a few
> percent really shouldn't be that big of an issue, and in many cases even
> a sha256 hash isn't going to have that dramatic of an impact on the
> actual overall time.

I don't understand how you can come to that conclusion?  It doesn't take
very long to measure openssl's sha256 performance (which is pretty well
optimized). Note that we do use openssl's sha256, when compiled with
openssl support.

On my workstation, with a pretty new (but not fastest single core perf
model) intel Xeon Gold 5215, I get:

$ openssl speed sha256
...
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           76711.75k   172036.78k   321566.89k   399008.09k   431423.49k   433689.94k

IOW, ~430MB/s.


On my laptop, with pretty fast cores:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           97054.91k   217188.63k   394864.13k   493441.02k   532100.44k   533441.19k

IOW, 530MB/s


530 MB/s is well within the realm of medium sized VMs.

And, as mentioned before. even if you do only half of that, you're still
going to be spending roughly half of the CPU time of sending a base
backup.

What makes you think that a few hundred MB/s is out of reach for a large
fraction of PG installations that actually keep backups?

Greetings,

Andres Freund



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-23 12:15:54 -0400, Robert Haas wrote:
> +       <varlistentry>
> +        <term><literal>MANIFEST</literal></term>
> +        <listitem>
> +         <para>
> +          When this option is specified with a value of <literal>ye'</literal>

s/ye'/yes/

> +          or <literal>force-escape</literal>, a backup manifest is created
> +          and sent along with the backup. The latter value forces all filenames
> +          to be hex-encoded; otherwise, this type of encoding is performed only
> +          for files whose names are non-UTF8 octet sequences.
> +          <literal>force-escape</literal> is intended primarily for testing
> +          purposes, to be sure that clients which read the backup manifest
> +          can handle this case. For compatibility with previous releases,
> +          the default is <literal>MANIFEST 'no'</literal>.
> +         </para>
> +        </listitem>
> +       </varlistentry>

Are you planning to include a specification of the manifest file format
anywhere? I looked through the patches and didn't find anything.

I think it'd also be good to include more information about what the
point of manifest files actually is.


> +  <para>
> +   <application>pg_validatebackup</application> reads the manifest file of a
> +   backup, verifies the manifest against its own internal checksum, and then
> +   verifies that the same files are present in the target directory as in the
> +   manifest itself. It then verifies that each file has the expected checksum,
> +   unless the backup was taken the checksum algorithm set to
> +   <literal>none</literal>, in which case checksum verification is not
> +   performed. The presence or absence of directories is not checked, except
> +   indirectly: if a directory is missing, any files it should have contained
> +   will necessarily also be missing. Certain files and directories are
> +   excluded from verification:
> +  </para>

Depending on what you want to use the manifest for, we'd also need to
check that there are no additional files. That seems to actually be
implemented, which imo should be mentioned here.




> +/*
> + * Finalize the backup manifest, and send it to the client.
> + */
> +static void
> +SendBackupManifest(manifest_info *manifest)
> +{
> +    StringInfoData protobuf;
> +    uint8        checksumbuf[PG_SHA256_DIGEST_LENGTH];
> +    char        checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
> +    size_t        manifest_bytes_done = 0;
> +
> +    /*
> +     * If there is no buffile, then the user doesn't want a manifest, so
> +     * don't waste any time generating one.
> +     */
> +    if (manifest->buffile == NULL)
> +        return;
> +
> +    /* Terminate the list of files. */
> +    AppendStringToManifest(manifest, "],\n");
> +
> +    /*
> +     * Append manifest checksum, so that the problems with the manifest itself
> +     * can be detected.
> +     *
> +     * We always use SHA-256 for this, regardless of what algorithm is chosen
> +     * for checksumming the files.  If we ever want to make the checksum
> +     * algorithm used for the manifest file variable, the client will need a
> +     * way to figure out which algorithm to use as close to the beginning of
> +     * the manifest file as possible, to avoid having to read the whole thing
> +     * twice.
> +     */
> +    manifest->still_checksumming = false;
> +    pg_sha256_final(&manifest->manifest_ctx, checksumbuf);
> +    AppendStringToManifest(manifest, "\"Manifest-Checksum\": \"");
> +    hex_encode((char *) checksumbuf, sizeof checksumbuf, checksumstringbuf);
> +    checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH - 1] = '\0';
> +    AppendStringToManifest(manifest, checksumstringbuf);
> +    AppendStringToManifest(manifest, "\"}\n");

Hm. Is it a great choice to include the checksum for the manifest inside
the manifest itself? With a cryptographic checksum it seems like it
could make a ton of sense to store the checksum somewhere "safe", but
keep the manifest itself alongside the base backup itself. While not
huge, they won't be tiny either.



> diff --git a/src/bin/pg_validatebackup/parse_manifest.c b/src/bin/pg_validatebackup/parse_manifest.c
> new file mode 100644
> index 0000000000..e6b42adfda
> --- /dev/null
> +++ b/src/bin/pg_validatebackup/parse_manifest.c
> @@ -0,0 +1,576 @@
> +/*-------------------------------------------------------------------------
> + *
> + * parse_manifest.c
> + *      Parse a backup manifest in JSON format.
> + *
> + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
> + *
> + * src/bin/pg_validatebackup/parse_manifest.c
> + *
> + *-------------------------------------------------------------------------
> + */

Doesn't have to be in the first version, but could it be useful to move
this to common/ or such?



> +/*
> + * Validate one directory.
> + *
> + * 'relpath' is NULL if we are to validate the top-level backup directory,
> + * and otherwise the relative path to the directory that is to be validated.
> + *
> + * 'fullpath' is the backup directory with 'relpath' appended; i.e. the actual
> + * filesystem path at which it can be found.
> + */
> +static void
> +validate_backup_directory(validator_context *context, char *relpath,
> +                          char *fullpath)
> +{

Hm. Should this warn if the directory's permissions are set too openly
(world writable?)?


> +/*
> + * Validate the checksum of a single file.
> + */
> +static void
> +validate_file_checksum(validator_context *context, manifestfile *tabent,
> +                       char *fullpath)
> +{
> +    pg_checksum_context checksum_ctx;
> +    char       *relpath = tabent->pathname;
> +    int            fd;
> +    int            rc;
> +    uint8        buffer[READ_CHUNK_SIZE];
> +    uint8        checksumbuf[PG_CHECKSUM_MAX_LENGTH];
> +    int            checksumlen;
> +
> +    /* Open the target file. */
> +    if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
> +    {
> +        report_backup_error(context, "could not open file \"%s\": %m",
> +                           relpath);
> +        return;
> +    }
> +
> +    /* Initialize checksum context. */
> +    pg_checksum_init(&checksum_ctx, tabent->checksum_type);
> +
> +    /* Read the file chunk by chunk, updating the checksum as we go. */
> +    while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
> +        pg_checksum_update(&checksum_ctx, buffer, rc);
> +    if (rc < 0)
> +        report_backup_error(context, "could not read file \"%s\": %m",
> +                           relpath);
> +

Hm. I think it'd be good to verify that the checksummed size is the same
as the size of the file in the manifest.



Greetings,

Andres Freund



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2020-03-26 11:37:48 -0400, Robert Haas wrote:
> > I'm sorry that you can't see how that's sensible, but it doesn't mean
> > that it isn't sensible. It is totally unrealistic to expect that any
> > backup verification tool can verify that you won't get an error when
> > trying to use the backup. That would require that everything that the
> > validation tool try to do everything that PostgreSQL will try to do
> > when the backup is used, including running recovery and updating the
> > data files. Anything less than that creates a real possibility that
> > the backup will verify good but fail when used. This tool has a much
> > narrower purpose, which is to try to verify that we (still) have the
> > files the server sent as part of the backup and that, to the best of
> > our ability to detect such things, they have not been modified. As you
> > know, or should know, the WAL files are not sent as part of the
> > backup, and so are not verified. Other things that would also be
> > useful to check are also not verified. It would be fantastic to have
> > more verification tools in the future, but it is difficult to see why
> > anyone would bother trying if an attempt to get the first one
> > committed gets blocked because it does not yet do everything. Very few
> > patches try to do everything, and those that do usually get blocked
> > because, by trying to do too much, they get some of it badly wrong.
>
> It sounds to me that if there are to be manifests for the WAL, it should
> be a separate (set of) manifests. Trying to somehow tie together the
> manifest for the base backup, and the one for the WAL, makes little
> sense to me. They're commonly not computed in one place, often not even
> stored in the same place. For PITR relevant WAL doesn't even exist yet
> at the time the manifest is created (and thus obviously cannot be
> included in the base backup manifest). And fairly obviously one would
> want to be able to verify the correctness of WAL between two
> basebackups.

We aren't talking about generic PITR or about tools other than
pg_basebackup, which has specific options for grabbing the WAL, and
making sure that it is all there for the backup that was taken.

> I don't see much point in complicating the design to somehow capture WAL
> in the manifest, when it's only going to solve a small set of cases.

As it relates to this, I tend to think that it solves the exact case
that pg_basebackup is built for and used for.  I said up-thread that if
someone does decide to use -X none then we could just throw a warning
(and perhaps have a way to override that if there's desire for it).

> Seems better to (later?) add support for generating manifests for WAL
> files, and then have a tool that can verify all the manifests required
> to restore a base backup.

I'm not trying to expand on the feature set here or move the goalposts
way down the road, which is what seems to be what's being suggested
here.  To be clear, I don't have any objection to adding a generic tool
for validating WAL as you're talking about here, but I also don't think
that's required for pg_validatebackup.  What I do think we need is a
check of the WAL that's fetched when people use pg_basebackup -Xstream
or -Xfetch.  pg_basebackup itself has that check because it's critical
to the backup being successful and valid.  Not having that basic
validation of a backup really just isn't ok- there's a reason
pg_basebackup has that check.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Thu, Mar 26, 2020 at 4:37 PM David Steele <david@pgmasters.net> wrote:
> I know you and Stephen have agreed on a number of doc changes, would it
> be possible to get a new patch with those included? I finally have time
> to do a review of this tomorrow.  I saw some mistakes in the docs in the
> current patch but I know those patches are not current.

Hi David,

Here's a new version with some fixes:

- Fixes for doc typos noted by Stephen Frost and Andres Freund.
- Replace a doc paragraph about the advantages and disadvantages of
CRC-32C with one by Stephen Frost, with a slightly change by me that I
thought made it sound more grammatical.
- Change the pg_validatebackup documentation so that it makes no
mention of compatible tools, per Stephen.
- Reword the discussion of the exclude list in the pg_validatebackup
documentation, per discussion between Stephen and myself.
- Try to make the documentation more clear about the fact that we
check for both extra and missing files.
- Incorporate a fix from Amit Kapila to make 003_corruption.pl pass on Windows.

HTH,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Fri, Mar 27, 2020 at 1:06 AM Andres Freund <andres@anarazel.de> wrote:
> > Like, suppose we change the default from CRC-32C to SHA-something. On
> > the upside, the error detection rate will increase from 99.9999999+%
> > to something much closer to 100%.
>
> FWIW, I don't buy the relevancy of 99.9999999+% at all. That's assuming
> a single bit error (at relevant lengths, before that it's single burst
> errors of a greater length), which isn't that relevant for our purposes.
>
> That's not to say that I don't think a CRC check can provide value. It
> does provide a high likelihood of detecting enough errors, including
> coding errors in how data is restored (not unimportant), that you're
> likely not find out aobut a problem soon.

So, I'm glad that you think a CRC check gives a sufficiently good
chance of detection errors, but I don't understand what your objection
to the percentage.  Stephen just objected to it again, too:

On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
> > I mean, the property that I care about is the one where it detects
> > better than 999,999,999 errors out of every 1,000,000,000, regardless
> > of input length.
>
> Throwing these kinds of things around I really don't think is useful.

...but I don't understand his reasoning, or yours.

My reasoning for thinking that the number is accurate is that a 32-bit
checksum has 2^32 possible results. If all of those results are
equally probable, then the probability that two files with unequal
contents produce the same result is 2^-32. This does assume that the
hash function is perfect, which no hash function is, so the actual
probability of a collision is likely higher. But if the hash function
is pretty good, it shouldn't be all that much higher. Note that I am
making no assumptions here about how many bits are different, nor am I
making any assumption about the length of a file. I am simply saying
that an n-bit checksum should detect a difference between two files
with a probability of roughly 1-2^{-n}, modulo the imperfections of
the hash function. I thought that this was a well-accepted fact that
would produce little argument from anybody, and I'm confused that
people seem to feel otherwise.

One explanation that would make sense to me is if somebody said, well,
the nature of this particular algorithm means that, although values
are uniformly distributed in general, the kinds of errors that are
likely to occur in practice are likely to cancel out. For instance, if
you imagine trivial algorithms such as adding or xor-ing all the
bytes, adding zero bytes doesn't change the answer, and neither do
transpositions. However, CRC is, AIUI, designed to be resistant to
such problems. Your remark about large blocks of zero bytes is
interesting to me in this context, but in a quick search I couldn't
find anything stating that CRC was weak for such use cases.

The old thread about switching from 64-bit CRC to 32-bit CRC had a
link to a page which has subsequently been moved to here:

https://www.ece.unb.ca/tervo/ee4253/crc.shtml

Down towards the bottom, it says:

"In general, bit errors and bursts up to N-bits long will be detected
for a P(x) of degree N. For arbitrary bit errors longer than N-bits,
the odds are one in 2^{N} than a totally false bit pattern will
nonetheless lead to a zero remainder."

Which I think is the same thing I'm saying: the chances of failing to
detecting an error with a decent n-bit checksum ought to be about
2^{-N}. If that's not right, I'd really like to understand why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Thu, Mar 26, 2020 at 4:37 PM David Steele <david@pgmasters.net> wrote:
> I agree with Stephen that this should be done, but I agree with you that
> it can wait for a future commit. However, I do think:
>
> 1) It should be called out rather plainly in the documentation.
> 2) If there are files in pg_wal then pg_validatebackup should inform the
> user that those files have not been validated.

I agree with you about #1, and I suspect that there's a way to improve
what I've got here now, but I think I might be too close to this to
figure out what the best way would be, so suggestions welcome.

I think #2 is an interesting idea and could possibly reduce the danger
of user confusion on this point considerably - because, let's face it,
not everyone is going to read the documentation. However, I'm having a
hard time figuring out exactly what we'd print. Right now on success,
unless you specify -q, you get:

[rhaas ~]$ pg_validatebackup  ~/pgslave
backup successfully verified

But it feels strange and possibly confusing to me to print something like:

[rhaas ~]$ pg_validatebackup  ~/pgslave
backup successfully verified (except for pg_wal)

...because there are a few other exceptions too, and also because it
might make the user think that we normally check that but for some
reason decided to skip it in this case. Maybe something more verbose
like:

[rhaas ~]$ pg_validatebackup  ~/pgslave
backup files successfully verified
your backup contains a pg_wal directory, but this tool can't validate
that, so do it yourself

...but that seems a little obnoxious and a little silly to print out every time.

Ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
> Is it actually possible, today, in PG, to have a 4GB WAL record?
> Judging this based on the WAL record size doesn't seem quite right.

I'm not sure. I mean, most records are quite small, but I think if you
set REPLICA IDENTITY FULL on a table with a bunch of very wide columns
(and also wal_level=logical) it can get really big. I haven't tested
to figure out just how big it can get. (If I have a table with lots of
almost-1GB-blobs in it, does it work without logical replication and
fail with logical replication? I don't know, but I doubt a WAL record
>4GB is possible, because it seems unlikely that the code has a way to
cope with that struct field overflowing.)

> Again, I'm not against having a checksum algorithm as a option.  I'm not
> saying that it must be SHA512 as the default.

I think that what we have seen so far is that all of the SHA-n
algorithms that PostgreSQL supports are about equally slow, so it
doesn't really matter which one you pick there from a performance
point of view. If you're not saying it has to be SHA-512 but you do
want it to be SHA-256, I don't think that really fixes anything. Using
CRC-32C does fix the performance issue, but I don't think you like
that, either. We could default to having no checksums at all, or even
no manifest at all, but I didn't get the impression that David, at
least, wanted to go that way, and I don't like it either. It's not the
world's best feature, but I think it's good enough to justify enabling
it by default. So I'm not sure we have any options here that will
satisfy you.

> > > I don't agree with limiting our view to only those algorithms that we've
> > > already got implemented in PG.
> >
> > I mean, opening that giant can of worms ~2 weeks before feature freeze
> > is not very nice. This patch has been around for months, and the
> > algorithms were openly discussed a long time ago.
>
> Yes, they were discussed before, and these issues were brought up before
> and there was specifically concern brought up about exactly the same
> issues that I'm repeating here. Those concerns seem to have been
> largely ignored, apparently because "we don't have that in PG today" as
> at least one of the considerations- even though we used to.

I might have missed something, but I don't remember any suggestion of
CRC-64 or other algorithms for which PG does not currently have
support prior to this week. The only thing I remember having been
suggested previously was SHA, and I responded to that by adding
support for SHA, not by ignoring the suggestion. If there was another
suggestion made earlier, I must have missed it.

> I also had hoped that
> David's concerns that were raised before had been heeded, as I knew he
> was involved in the discussion previously, but that turns out to not
> have been the case.

Well, I mean, I am trying pretty hard here, but I realize that I'm not
succeeding. I don't know which specific suggestion you're talking
about here. I understand that there is a concern about a 32-bit CRC
somehow not being valid for more than 512MB, but based on my research,
I believe that to be incorrect. I've explained the reasons why I
believe it to be incorrect several times now, but I feel like we're
just going around in circles. If my explanation of why it's incorrect
is itself incorrect, tell me why, but let's not just keep saying the
things we've both already said.

> Yes, that looks fine.  Feels slightly redundant to include the "as
> described above ..." bit, and I think that could be dropped, but up to
> you.

Done in the version I posted a bit ago.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
> s/ye'/yes/

Ugh, sorry. Fixed in the version posted earlier.

> Are you planning to include a specification of the manifest file format
> anywhere? I looked through the patches and didn't find anything.

I thought about that. I think it would be good to have. I was sort of
hoping to leave it for a follow-on patch, but maybe that's cheating
too much.

> I think it'd also be good to include more information about what the
> point of manifest files actually is.

What kind of information do you want to see included there? Basically,
the way the documentation is written right now, it essentially says,
well, we have this manifest thing so that you can later run
pg_validatebackup, and pg_validatebackup says that it's there to check
the integrity of backups using the manifest. This is all a bit
circular, though, and maybe needs elaboration.

What I've experienced is that:

- Sometimes people take a backup and then wonder later whether the
disk has flipped some bits.
- Sometimes people restore a backup and forget some of the parts, like
the user-defined tablespaces.
- Sometimes anti-virus software, or poorly-run cron job run amok,
wander around inflicting unpredictable damage.

It would be nice to have a system that would notice these kinds of
things on a running system, but here I've got the more modest goal of
checking for in the context of a backup. If the data gets corrupted in
transit, or if the disk mutilates it, or if the user mutilates it, you
need something to check the backup against to find out that bad things
have happend; the manifest is that thing. But I don't know exactly how
much of all that should go in the docs, or in what way.

> > +  <para>
> > +   <application>pg_validatebackup</application> reads the manifest file of a
> > +   backup, verifies the manifest against its own internal checksum, and then
> > +   verifies that the same files are present in the target directory as in the
> > +   manifest itself. It then verifies that each file has the expected checksum,
>
> Depending on what you want to use the manifest for, we'd also need to
> check that there are no additional files. That seems to actually be
> implemented, which imo should be mentioned here.

I intended the text to say that, because it says that it checks that
the two things are "the same," which is symmetric.  In the new version
I posted a bit ago, I tried to make it more explicit, because
apparently it was not sufficiently clear.

> Hm. Is it a great choice to include the checksum for the manifest inside
> the manifest itself? With a cryptographic checksum it seems like it
> could make a ton of sense to store the checksum somewhere "safe", but
> keep the manifest itself alongside the base backup itself. While not
> huge, they won't be tiny either.

Seems like the user could just copy the manifest checksum and store it
somewhere, if they wish. Then they can check it against the manifest
itself later, if they wish. Or they can take a SHA-512 of the whole
file and store that securely. The problem is that we have no idea how
to write that checksum to a more security storage. We could write
backup_manifest and backup_manifest.checksum into separate files, but
that seems like it's adding complexity without any real benefit.

To me, the security-related uses of this patch seem to be fairly
niche. I think it's nice that they exist, but I don't think that's the
main selling point. For me, the main selling point is that you can
check that your disk didn't eat your data and that nobody nuked any
files that were supposed to be there.

> Doesn't have to be in the first version, but could it be useful to move
> this to common/ or such?

Yeah. At one point, this code was written in a way that was totally
specific to pg_validatebackup, but I then realized that it would be
better to make it more general, so I refactored it into in the form
you see now, where pg_validatebackup.c depends on parse_manifest.c but
not the reverse. I suspect that if someone wants to use this for
something else they might need to change a few more things - not sure
exactly what - but I don't think it would be too hard. I thought it
would be best to leave that task until someone has a concrete use case
in mind, but I did want it to to be relatively easy to do that down
the road, and I hope that the way I've organized the code achieves
that.

> > +static void
> > +validate_backup_directory(validator_context *context, char *relpath,
> > +                                               char *fullpath)
> > +{
>
> Hm. Should this warn if the directory's permissions are set too openly
> (world writable?)?

I don't think so, but it's pretty clear that different people have
different ideas about what the scope of this tool ought to be, even in
this first version.

> Hm. I think it'd be good to verify that the checksummed size is the same
> as the size of the file in the manifest.

That's checked in an earlier phase. Are you worried about the file
being modified after the first pass checks the size and before we come
through to do the checksumming?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
> > Seems better to (later?) add support for generating manifests for WAL
> > files, and then have a tool that can verify all the manifests required
> > to restore a base backup.
>
> I'm not trying to expand on the feature set here or move the goalposts
> way down the road, which is what seems to be what's being suggested
> here.  To be clear, I don't have any objection to adding a generic tool
> for validating WAL as you're talking about here, but I also don't think
> that's required for pg_validatebackup.  What I do think we need is a
> check of the WAL that's fetched when people use pg_basebackup -Xstream
> or -Xfetch.  pg_basebackup itself has that check because it's critical
> to the backup being successful and valid.  Not having that basic
> validation of a backup really just isn't ok- there's a reason
> pg_basebackup has that check.

I don't understand how this could be done without significantly
complicating the architecture. As I said before, -Xstream sends WAL
over a separate connection that is unrelated to the one running
BASE_BACKUP, so the base-backup connection doesn't know what to
include in the manifest. Now you could do something like: once all of
the WAL files have been fetched, the client checksums all of those and
sends their names and checksums to the server, which turns around and
puts them into the manifest, which it then sends back to the client.
But that is actually quite a bit of additional complexity, and it's
pretty strange, too, because now you have the client checksumming some
files and the server checksumming others. I know you mentioned a few
different ideas before, but I think they all kinda have some problem
along these lines.

I also kinda disagree with the idea that the WAL should be considered
an integral part of the backup. I don't know how pgbackrest does
things, but BART stores each backup in a separate directly without any
associated WAL, and then keeps all the WAL together in a different
directory. I imagine that people who are using continuous archiving
also tend to use -Xnone, or if they do backups by copying the files
rather than using pg_backrest, they exclude pg_wal. In fact, for
people with big, important databases, I'd assume that would be the
normal pattern. You presumably wouldn't want to keep one copy of the
WAL files taken during the backup with the backup itself, and a
separate copy in the archive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Mar 26, 2020 at 4:37 PM David Steele <david@pgmasters.net> wrote:
> > I agree with Stephen that this should be done, but I agree with you that
> > it can wait for a future commit. However, I do think:
> >
> > 1) It should be called out rather plainly in the documentation.
> > 2) If there are files in pg_wal then pg_validatebackup should inform the
> > user that those files have not been validated.
>
> I agree with you about #1, and I suspect that there's a way to improve
> what I've got here now, but I think I might be too close to this to
> figure out what the best way would be, so suggestions welcome.
>
> I think #2 is an interesting idea and could possibly reduce the danger
> of user confusion on this point considerably - because, let's face it,
> not everyone is going to read the documentation. However, I'm having a
> hard time figuring out exactly what we'd print. Right now on success,
> unless you specify -q, you get:
>
> [rhaas ~]$ pg_validatebackup  ~/pgslave
> backup successfully verified
>
> But it feels strange and possibly confusing to me to print something like:
>
> [rhaas ~]$ pg_validatebackup  ~/pgslave
> backup successfully verified (except for pg_wal)
>
> ...because there are a few other exceptions too, and also because it

The exceptions you're referring to here are things like the various
signal files, that the user can recreated pretty easily..?  I don't
think those really rise to the level of pg_wal.

What I would hope to see (... well, we know what I *really* would hope
to see, but if we really go this route) is something like:

WARNING: pg_wal not empty, WAL files are not validated by this tool
data files successfully verified

and a non-zero exit code.

Basically, if you're doing WAL yourself, then you'd use pg_receivewal
and maybe your own manifest-building code for WAL or something and then
use -X none with pg_basebackup.

Then again, I'd have -X none throw a warning too.  I'd be alright with
all of these having override switches to say "ok, I get it, don't
complain about it".

I disagree with the idea of writing "backup successfully verified" when
we aren't doing any checking of the WAL that's essential for the backup
(unlike various signal files and whatnot, which aren't...).

Thanks,

Stephen

Attachment

Re: backup manifests

From
David Steele
Date:
On 3/27/20 1:53 PM, Robert Haas wrote:
> On Thu, Mar 26, 2020 at 4:37 PM David Steele <david@pgmasters.net> wrote:
>> I know you and Stephen have agreed on a number of doc changes, would it
>> be possible to get a new patch with those included? I finally have time
>> to do a review of this tomorrow.  I saw some mistakes in the docs in the
>> current patch but I know those patches are not current.
> 
> Hi David,
> 
> Here's a new version with some fixes:
> 
> - Fixes for doc typos noted by Stephen Frost and Andres Freund.
> - Replace a doc paragraph about the advantages and disadvantages of
> CRC-32C with one by Stephen Frost, with a slightly change by me that I
> thought made it sound more grammatical.
> - Change the pg_validatebackup documentation so that it makes no
> mention of compatible tools, per Stephen.
> - Reword the discussion of the exclude list in the pg_validatebackup
> documentation, per discussion between Stephen and myself.
> - Try to make the documentation more clear about the fact that we
> check for both extra and missing files.
> - Incorporate a fix from Amit Kapila to make 003_corruption.pl pass on Windows.

Thanks!

There appear to be conflicts with 67e0adfb3f98:

$ git apply -3 
../download/v14-0002-Generate-backup-manifests-for-base-backups-and-v.patch
../download/v14-0002-Generate-backup-manifests-for-base-backups-and-v.patch:3396: 
trailing whitespace.
sub cleanup_search_directory_fails
error: patch failed: src/backend/replication/basebackup.c:258
Falling back to three-way merge...
Applied patch to 'src/backend/replication/basebackup.c' with conflicts.
U src/backend/replication/basebackup.c
warning: 1 line adds whitespace errors.

 > +          Specifies the algorithm that should be used to checksum 
each file
 > +          for purposes of the backup manifest. Currently, the available

perhaps "for inclusion in the backup manifest"?  Anyway, I think this 
sentence is awkward.

 > +        Specifies the algorithm that should be used to checksum each 
file
 > +        for purposes of the backup manifest. Currently, the available

And again.

 > +        because the files themselves do not need to read.

should be "need to be read".

 > +        the manifest itself will always contain a 
<literal>SHA256</literal>

I think just "the manifest will always contain" is fine.

 > +        manifeste itself, and is therefore ignored. Note that the 
manifest

typo "manifeste", perhaps remove itself.

 > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27 
18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" },

Storing the checksum type with each file seems pretty redundant. 
Perhaps that could go in the header?  You could always override if a 
specific file had a different checksum type, though that seems unlikely.

In general it might be good to go with shorter keys: "mod", "chk", etc. 
Manifests can get pretty big and that's a lot of extra bytes.

I'm also partial to using epoch time in the manifest because it is 
generally easier for programs to work with.  But, human-readable doesn't 
suck, either.

 >      if (maxrate > 0)
 >         maxrate_clause = psprintf("MAX_RATE %u", maxrate);
 > +    if (manifest)

A linefeed here would be nice.

 > +    manifestfile *tabent;

This is an odd name.  A holdover from the tab-delimited version?

 > +    printf(_("Usage:\n  %s [OPTION]... BACKUPDIR\n\n"), progname);

When I ran pg_validatebackup I expected to use -D to specify the backup 
dir since pg_basebackup does.  On the other hand -D is weird because I 
*really* expect that to be the pg data dir.

But, do we want this to be different from pg_basebackup?

 > +        checksum_length = checksum_string_length / 2;

This check is defeated if a single character is added the to checksum.

Not too big a deal since you still get an error, but still.

 > + * Verify that the manifest checksum is correct.

This is not working the way I would expect -- I could freely modify the 
manifest without getting a checksum error on the manifest.  For example:

$ /home/vagrant/test/pg/bin/pg_validatebackup test/backup3
pg_validatebackup: fatal: invalid checksum for file "backup_label": 
"408901e0814f40f8ceb7796309a59c7248458325a21941e7c55568e381f53831?"

So, if I deleted the entry above, I got a manifest checksum error.  But 
if I just modified the checksum I get a file checksum error with no 
manifest checksum error.

I would prefer a manifest checksum error in all cases where it is wrong, 
unless --exit-on-error is specified.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Is it actually possible, today, in PG, to have a 4GB WAL record?
> > Judging this based on the WAL record size doesn't seem quite right.
>
> I'm not sure. I mean, most records are quite small, but I think if you
> set REPLICA IDENTITY FULL on a table with a bunch of very wide columns
> (and also wal_level=logical) it can get really big. I haven't tested
> to figure out just how big it can get. (If I have a table with lots of
> almost-1GB-blobs in it, does it work without logical replication and
> fail with logical replication? I don't know, but I doubt a WAL record
> >4GB is possible, because it seems unlikely that the code has a way to
> cope with that struct field overflowing.)

Interesting..  Well, topic for another thread, but I'd say if we believe
that's possible then we might want to consider if the crc32c is a good
decision to use still there.

> > Again, I'm not against having a checksum algorithm as a option.  I'm not
> > saying that it must be SHA512 as the default.
>
> I think that what we have seen so far is that all of the SHA-n
> algorithms that PostgreSQL supports are about equally slow, so it
> doesn't really matter which one you pick there from a performance
> point of view. If you're not saying it has to be SHA-512 but you do
> want it to be SHA-256, I don't think that really fixes anything. Using
> CRC-32C does fix the performance issue, but I don't think you like
> that, either. We could default to having no checksums at all, or even
> no manifest at all, but I didn't get the impression that David, at
> least, wanted to go that way, and I don't like it either. It's not the
> world's best feature, but I think it's good enough to justify enabling
> it by default. So I'm not sure we have any options here that will
> satisfy you.

I do like having a manifest by default.  At this point it's pretty clear
that we've just got a fundamental disagreement that more words aren't
going to fix.  I'd rather we play it safe and use a sha256 hash and
accept that it's going to be slower by default, and then give users an
option to make it go faster if they want (though I'd much rather that
alternative be a 64bit CRC than a 32bit one).

Andres seems to agree with you.  I'm not sure where David sits on this
specific question.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-27 14:13:17 -0400, Robert Haas wrote:
> On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > I mean, the property that I care about is the one where it detects
> > > better than 999,999,999 errors out of every 1,000,000,000, regardless
> > > of input length.
> >
> > Throwing these kinds of things around I really don't think is useful.
> 
> ...but I don't understand his reasoning, or yours.
> 
> My reasoning for thinking that the number is accurate is that a 32-bit
> checksum has 2^32 possible results. If all of those results are
> equally probable, then the probability that two files with unequal
> contents produce the same result is 2^-32. This does assume that the
> hash function is perfect, which no hash function is, so the actual
> probability of a collision is likely higher. But if the hash function
> is pretty good, it shouldn't be all that much higher. Note that I am
> making no assumptions here about how many bits are different, nor am I
> making any assumption about the length of a file. I am simply saying
> that an n-bit checksum should detect a difference between two files
> with a probability of roughly 1-2^{-n}, modulo the imperfections of
> the hash function. I thought that this was a well-accepted fact that
> would produce little argument from anybody, and I'm confused that
> people seem to feel otherwise.

Well: crc32 is a terrible hash, if you're looking for even distribution
of hashed values. That's not too surprising - its design goals included
guaranteed error detection for certain lengths, and error correction of
single bit errors.  My understanding of the underlying math is spotty at
best, but from what I understand that does pretty directly imply less
independence between source data -> hash value than what we'd want from
a good hash function.

Here's an smhasher result page for crc32 (at least the latter is crc32
afaict):
https://notabug.org/vaeringjar/smhasher/src/master/doc/crc32
https://notabug.org/vaeringjar/smhasher/src/master/doc/crc32_hw

and then compare that with something like xxhash, or even lookup3 (which
I think is what our hash is a variant of):
https://notabug.org/vaeringjar/smhasher/src/master/doc/xxHash32
https://notabug.org/vaeringjar/smhasher/src/master/doc/lookup3

The birthday paradoxon doesn't apply (otherwise 32bit would never be
enough, at a 50% chance of conflict at around 80k hashes), but still I
do wonder if it matters that we're trying to detect errors in not one,
but commonly tens of thousands to millions of files. But since we just
need to detect one error to call the whole backup corrupt...


> One explanation that would make sense to me is if somebody said, well,
> the nature of this particular algorithm means that, although values
> are uniformly distributed in general, the kinds of errors that are
> likely to occur in practice are likely to cancel out. For instance, if
> you imagine trivial algorithms such as adding or xor-ing all the
> bytes, adding zero bytes doesn't change the answer, and neither do
> transpositions. However, CRC is, AIUI, designed to be resistant to
> such problems. Your remark about large blocks of zero bytes is
> interesting to me in this context, but in a quick search I couldn't
> find anything stating that CRC was weak for such use cases.

My main point was that CRC's error detection guarantees are pretty much
irrelevant for us. I.e. while the right CRC will guarantee that all
single 2 bit errors will be detected, that's not a helpful property for
us. There rarely are single bit errors, and the bursts are too long to
to benefit from any >2 bit guarantees. Nor are multiple failures rare
once you hit a problem.


> The old thread about switching from 64-bit CRC to 32-bit CRC had a
> link to a page which has subsequently been moved to here:
> 
> https://www.ece.unb.ca/tervo/ee4253/crc.shtml
> 
> Down towards the bottom, it says:
> 
> "In general, bit errors and bursts up to N-bits long will be detected
> for a P(x) of degree N. For arbitrary bit errors longer than N-bits,
> the odds are one in 2^{N} than a totally false bit pattern will
> nonetheless lead to a zero remainder."

That's still about a single sequence of bit errors though, as far as I
can tell. I.e. it doesn't hold for CRCs if you have two errors at
different places.

Greetings,

Andres Freund



Re: backup manifests

From
David Steele
Date:
On 3/27/20 3:20 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
> 
>> Hm. Is it a great choice to include the checksum for the manifest inside
>> the manifest itself? With a cryptographic checksum it seems like it
>> could make a ton of sense to store the checksum somewhere "safe", but
>> keep the manifest itself alongside the base backup itself. While not
>> huge, they won't be tiny either.
> 
> Seems like the user could just copy the manifest checksum and store it
> somewhere, if they wish. Then they can check it against the manifest
> itself later, if they wish. Or they can take a SHA-512 of the whole
> file and store that securely. The problem is that we have no idea how
> to write that checksum to a more security storage. We could write
> backup_manifest and backup_manifest.checksum into separate files, but
> that seems like it's adding complexity without any real benefit.

I agree that this seems like a separate problem. What Robert has done 
here is detect random mutilation of the manifest.

To prevent malicious modifications you either need to store the checksum 
in another place, or digitally sign the file and store that alongside it 
(or inside it even). Either way seems pretty far out of scope to me.

>> Hm. I think it'd be good to verify that the checksummed size is the same
>> as the size of the file in the manifest.
> 
> That's checked in an earlier phase. Are you worried about the file
> being modified after the first pass checks the size and before we come
> through to do the checksumming?

I prefer to validate the size and checksum in the same pass, but I'm not 
sure it's that big a deal.  If the backup is being corrupted under the 
validate process that would also apply to files that had already been 
validated.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-27 15:29:02 -0400, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
> > > Seems better to (later?) add support for generating manifests for WAL
> > > files, and then have a tool that can verify all the manifests required
> > > to restore a base backup.
> >
> > I'm not trying to expand on the feature set here or move the goalposts
> > way down the road, which is what seems to be what's being suggested
> > here.  To be clear, I don't have any objection to adding a generic tool
> > for validating WAL as you're talking about here, but I also don't think
> > that's required for pg_validatebackup.  What I do think we need is a
> > check of the WAL that's fetched when people use pg_basebackup -Xstream
> > or -Xfetch.  pg_basebackup itself has that check because it's critical
> > to the backup being successful and valid.  Not having that basic
> > validation of a backup really just isn't ok- there's a reason
> > pg_basebackup has that check.
> 
> I don't understand how this could be done without significantly
> complicating the architecture. As I said before, -Xstream sends WAL
> over a separate connection that is unrelated to the one running
> BASE_BACKUP, so the base-backup connection doesn't know what to
> include in the manifest. Now you could do something like: once all of
> the WAL files have been fetched, the client checksums all of those and
> sends their names and checksums to the server, which turns around and
> puts them into the manifest, which it then sends back to the client.
> But that is actually quite a bit of additional complexity, and it's
> pretty strange, too, because now you have the client checksumming some
> files and the server checksumming others. I know you mentioned a few
> different ideas before, but I think they all kinda have some problem
> along these lines.

How about having separate manifests for segments? And have them stay
separate? And then have an option to verify the manifests for all the
WAL files that are required for a specific restore? The easiest way
would be to just add a separate manifest file for each segment, and name
them accordingly. But inventing a naming pattern that specifies both
start-end segments wouldn't be hard either, and result in fewer
manifests.

Base backups (in the backup sense, not for bringing up replicas etc)
without the ability to apply newer WAL are fairly pointless imo. And if
newer WAL is applied, there's not much point in just verifying the WAL
that's necessary to restore the base backup. Instead you'd want to be
able to verify all the WAL since the base backup to the "current" point
(or the next base backup).

For me having something inside pg_basebackup (or the server, for
-Xfetch) that somehow includes the WAL files in the manifest doesn't
really gain us much - it's obviously not something that'll help us to
verify all the WAL that needs to be applied (to either get the base
backup into a consistent state, or to roll forward to the desired
point).



> I also kinda disagree with the idea that the WAL should be considered
> an integral part of the backup. I don't know how pgbackrest does
> things, but BART stores each backup in a separate directly without any
> associated WAL, and then keeps all the WAL together in a different
> directory. I imagine that people who are using continuous archiving
> also tend to use -Xnone, or if they do backups by copying the files
> rather than using pg_backrest, they exclude pg_wal. In fact, for
> people with big, important databases, I'd assume that would be the
> normal pattern. You presumably wouldn't want to keep one copy of the
> WAL files taken during the backup with the backup itself, and a
> separate copy in the archive.

+1

I also don't see them as being as important, due to the already existing
checksums (which are of a much much much higher quality than what we
have for database pages, both by being wider, and by being much more
frequent in most cases). There's obviously a need to validate the WAL in
a nicer way than scripting pg_waldump - but that seems separate anyway.

Greetings,

Andres Freund



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-27 14:34:19 -0400, Robert Haas wrote:
> I think #2 is an interesting idea and could possibly reduce the danger
> of user confusion on this point considerably - because, let's face it,
> not everyone is going to read the documentation. However, I'm having a
> hard time figuring out exactly what we'd print. Right now on success,
> unless you specify -q, you get:
> 
> [rhaas ~]$ pg_validatebackup  ~/pgslave
> backup successfully verified
> 
> But it feels strange and possibly confusing to me to print something like:
> 
> [rhaas ~]$ pg_validatebackup  ~/pgslave
> backup successfully verified (except for pg_wal)

You could print something like:
WAL necessary to restore this base backup can be validated with:

pg_waldump -p ~/pgslave -t tl -s backup_start_location -e backup_end_loc > /dev/null && echo true

Obviously that specific invocation sucks, but it'd not be hard to add an
option to waldump to not output anything.

Greetings,

Andres Freund



Re: backup manifests

From
David Steele
Date:
On 3/27/20 3:29 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
>>> Seems better to (later?) add support for generating manifests for WAL
>>> files, and then have a tool that can verify all the manifests required
>>> to restore a base backup.
>>
>> I'm not trying to expand on the feature set here or move the goalposts
>> way down the road, which is what seems to be what's being suggested
>> here.  To be clear, I don't have any objection to adding a generic tool
>> for validating WAL as you're talking about here, but I also don't think
>> that's required for pg_validatebackup.  What I do think we need is a
>> check of the WAL that's fetched when people use pg_basebackup -Xstream
>> or -Xfetch.  pg_basebackup itself has that check because it's critical
>> to the backup being successful and valid.  Not having that basic
>> validation of a backup really just isn't ok- there's a reason
>> pg_basebackup has that check.
> 
> I don't understand how this could be done without significantly
> complicating the architecture. As I said before, -Xstream sends WAL
> over a separate connection that is unrelated to the one running
> BASE_BACKUP, so the base-backup connection doesn't know what to
> include in the manifest. Now you could do something like: once all of
> the WAL files have been fetched, the client checksums all of those and
> sends their names and checksums to the server, which turns around and
> puts them into the manifest, which it then sends back to the client.
> But that is actually quite a bit of additional complexity, and it's
> pretty strange, too, because now you have the client checksumming some
> files and the server checksumming others. I know you mentioned a few
> different ideas before, but I think they all kinda have some problem
> along these lines.
> 
> I also kinda disagree with the idea that the WAL should be considered
> an integral part of the backup. I don't know how pgbackrest does
> things, 

We checksum each WAL file while it is read and transmitted to the repo 
by the archive_command.  Then at the end of the backup we ensure that 
all the WAL required to make the backup consistent has made it to the repo.

> but BART stores each backup in a separate directly without any
> associated WAL, and then keeps all the WAL together in a different
> directory. I imagine that people who are using continuous archiving
> also tend to use -Xnone, or if they do backups by copying the files
> rather than using pg_backrest, they exclude pg_wal. In fact, for
> people with big, important databases, I'd assume that would be the
> normal pattern. You presumably wouldn't want to keep one copy of the
> WAL files taken during the backup with the backup itself, and a
> separate copy in the archive.

pgBackRest does provide the option to copy WAL into the backup directory 
for the super-paranoid, though it is not the default. It is pretty handy 
for moving individual backups some other medium like tape, though.

If -Xnone is specified then it seems like pg_validatebackup is 
completely off the hook.  But in the case of -Xstream or -Xfetch 
couldn't we at least verify that the expected WAL segments are present 
and the correct size?

Storing the start/stop lsn in the manifest would be a nice thing to have 
anyway and that would make this feature pretty trivial. Yeah, that's in 
the backup_label file as well but the manifest is so much easier to read.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-27 15:20:27 -0400, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
> > Are you planning to include a specification of the manifest file format
> > anywhere? I looked through the patches and didn't find anything.
> 
> I thought about that. I think it would be good to have. I was sort of
> hoping to leave it for a follow-on patch, but maybe that's cheating
> too much.

I don't like having a file format that's intended to be used by external
tools too that's undocumented except for code that assembles it in a
piecemeal fashion.  Do you mean in a follow-on patch this release, or
later? I don't have a problem with the former.


> > I think it'd also be good to include more information about what the
> > point of manifest files actually is.
> 
> What kind of information do you want to see included there? Basically,
> the way the documentation is written right now, it essentially says,
> well, we have this manifest thing so that you can later run
> pg_validatebackup, and pg_validatebackup says that it's there to check
> the integrity of backups using the manifest. This is all a bit
> circular, though, and maybe needs elaboration.

I do found it to be circular. I think we mostly need a paragraph or two
somewhere that explains on a higher level what the point of verifying
base backups is and what is verified.


> > Hm. Is it a great choice to include the checksum for the manifest inside
> > the manifest itself? With a cryptographic checksum it seems like it
> > could make a ton of sense to store the checksum somewhere "safe", but
> > keep the manifest itself alongside the base backup itself. While not
> > huge, they won't be tiny either.
> 
> Seems like the user could just copy the manifest checksum and store it
> somewhere, if they wish. Then they can check it against the manifest
> itself later, if they wish. Or they can take a SHA-512 of the whole
> file and store that securely. The problem is that we have no idea how
> to write that checksum to a more security storage. We could write
> backup_manifest and backup_manifest.checksum into separate files, but
> that seems like it's adding complexity without any real benefit.
> 
> To me, the security-related uses of this patch seem to be fairly
> niche. I think it's nice that they exist, but I don't think that's the
> main selling point. For me, the main selling point is that you can
> check that your disk didn't eat your data and that nobody nuked any
> files that were supposed to be there.

Oh, I agree. I wasn't really mentioning the crypto checksum because of
it being "security" stuff, but because of the quality of the guarantee
it gives. I don't know how large the manifest file will be for a setup
of with a lot of partitioned tables, but I'd expect it to not be
tiny. So not having to store it in the 'archiving sytem' is nice.

FWIW, I was thinking of backup_manifest.checksum potentially being
desirable for another reason: The need to embed the checksum inside the
document imo adds a fair bit of rigidity to the file format. See

> +static void
> +verify_manifest_checksum(JsonManifestParseState *parse, char *buffer,
> +                         size_t size)
> +{
...
> +
> +    /* Find the last two newlines in the file. */
> +    for (i = 0; i < size; ++i)
> +    {
> +        if (buffer[i] == '\n')
> +        {
> +            ++number_of_newlines;
> +            penultimate_newline = ultimate_newline;
> +            ultimate_newline = i;
> +        }
> +    }
> +
> +    /*
> +     * Make sure that the last newline is right at the end, and that there are
> +     * at least two lines total. We need this to be true in order for the
> +     * following code, which computes the manifest checksum, to work properly.
> +     */
> +    if (number_of_newlines < 2)
> +        json_manifest_parse_failure(parse->context,
> +                                    "expected at least 2 lines");
> +    if (ultimate_newline != size - 1)
> +        json_manifest_parse_failure(parse->context,
> +                                    "last line not newline-terminated");
> +
> +    /* Checksum the rest. */
> +    pg_sha256_init(&manifest_ctx);
> +    pg_sha256_update(&manifest_ctx, (uint8 *) buffer, penultimate_newline + 1);
> +    pg_sha256_final(&manifest_ctx, manifest_checksum_actual);

which certainly isn't "free form json".


> > Doesn't have to be in the first version, but could it be useful to move
> > this to common/ or such?
> 
> Yeah. At one point, this code was written in a way that was totally
> specific to pg_validatebackup, but I then realized that it would be
> better to make it more general, so I refactored it into in the form
> you see now, where pg_validatebackup.c depends on parse_manifest.c but
> not the reverse. I suspect that if someone wants to use this for
> something else they might need to change a few more things - not sure
> exactly what - but I don't think it would be too hard. I thought it
> would be best to leave that task until someone has a concrete use case
> in mind, but I did want it to to be relatively easy to do that down
> the road, and I hope that the way I've organized the code achieves
> that.

Cool.


> > > +static void
> > > +validate_backup_directory(validator_context *context, char *relpath,
> > > +                                               char *fullpath)
> > > +{
> >
> > Hm. Should this warn if the directory's permissions are set too openly
> > (world writable?)?
> 
> I don't think so, but it's pretty clear that different people have
> different ideas about what the scope of this tool ought to be, even in
> this first version.

Yea. I don't have a strong opinion on this specific issue. I was mostly
wondering because I've repeatedly seen people restore backups with world
readable properties, and with that it's obviously possible for somebody
else to change the contents after the checksum was computed.


> > Hm. I think it'd be good to verify that the checksummed size is the same
> > as the size of the file in the manifest.
> 
> That's checked in an earlier phase. Are you worried about the file
> being modified after the first pass checks the size and before we come
> through to do the checksumming?

Not really, I wondered about it for a bit, and then decided that it's
too remote an issue.

What I've seen a couple of times is that actually reading a file can
result in the file ending to be reported at a different position than
what stat() said. So by crosschecking the size while reading with the
one from stat (which was compared with the source system one) we'd make
the errors much better. It's certainly easier to know where to start
looking when validate says "error: read %llu bytes from file, expected
%llu" or something along those lines, than when it just were to report a
checksum error.

There's also some crypto hash algorithm weaknesses that are easier to
exploit when it's possible to append data to a known prefix, but that
doesn't seem an obvious threat here.


Greetings,

Andres Freund



Re: backup manifests

From
David Steele
Date:
On 3/27/20 3:55 PM, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> I think that what we have seen so far is that all of the SHA-n
>> algorithms that PostgreSQL supports are about equally slow, so it
>> doesn't really matter which one you pick there from a performance
>> point of view. If you're not saying it has to be SHA-512 but you do
>> want it to be SHA-256, I don't think that really fixes anything. Using
>> CRC-32C does fix the performance issue, but I don't think you like
>> that, either. We could default to having no checksums at all, or even
>> no manifest at all, but I didn't get the impression that David, at
>> least, wanted to go that way, and I don't like it either. It's not the
>> world's best feature, but I think it's good enough to justify enabling
>> it by default. So I'm not sure we have any options here that will
>> satisfy you.
> 
> I do like having a manifest by default.  At this point it's pretty clear
> that we've just got a fundamental disagreement that more words aren't
> going to fix.  I'd rather we play it safe and use a sha256 hash and
> accept that it's going to be slower by default, and then give users an
> option to make it go faster if they want (though I'd much rather that
> alternative be a 64bit CRC than a 32bit one).
> 
> Andres seems to agree with you.  I'm not sure where David sits on this
> specific question.

I would prefer a stronger checksum as the default but I would be fine 
with SHA1, which is a bit faster.

I believe the overhead of checksums is being overblown. In my experience 
the vast majority of users are using compression and running the backup 
over a network.  Once you have done those two things the cost of SHA1 is 
pretty negligible.  As I posted way up-thread we found that just gzip -6 
pushed the cost of SHA1 below 3% and that did not include network transfer.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
> > > Seems better to (later?) add support for generating manifests for WAL
> > > files, and then have a tool that can verify all the manifests required
> > > to restore a base backup.
> >
> > I'm not trying to expand on the feature set here or move the goalposts
> > way down the road, which is what seems to be what's being suggested
> > here.  To be clear, I don't have any objection to adding a generic tool
> > for validating WAL as you're talking about here, but I also don't think
> > that's required for pg_validatebackup.  What I do think we need is a
> > check of the WAL that's fetched when people use pg_basebackup -Xstream
> > or -Xfetch.  pg_basebackup itself has that check because it's critical
> > to the backup being successful and valid.  Not having that basic
> > validation of a backup really just isn't ok- there's a reason
> > pg_basebackup has that check.
>
> I don't understand how this could be done without significantly
> complicating the architecture. As I said before, -Xstream sends WAL
> over a separate connection that is unrelated to the one running
> BASE_BACKUP, so the base-backup connection doesn't know what to
> include in the manifest. Now you could do something like: once all of
> the WAL files have been fetched, the client checksums all of those and
> sends their names and checksums to the server, which turns around and
> puts them into the manifest, which it then sends back to the client.
> But that is actually quite a bit of additional complexity, and it's
> pretty strange, too, because now you have the client checksumming some
> files and the server checksumming others. I know you mentioned a few
> different ideas before, but I think they all kinda have some problem
> along these lines.

I've made some suggestions before, also chatted about an idea with David
that I'll outline here.

First off- I'm a bit mystified why you are saying that the base backup
connection doesn't know what to include in the manifest regarding WAL.
The base-backup process determines the starting position (and then even
puts it into the backup_label that's sent to the client), and then it
directly returns the ending position at the end of the BASE_BACKUP
command.  Given that we do know that information, then we just need to
get the checksums/hashes for each of the WAL files, if it's been asked
for.  How do we know checksums or hashes have been asked for in the
WAL streaming connection?  We can have the pg_basebackup process ask for
that when it connects to stream the WAL that's needed.

Now the only part that's a little grotty is dealing with passing the
checksums/hashes that the WAL stream connection calculates over to the
base backup connection to include in the manifest.  Offhand though, it
seems like we could drop a file in archive_status for that, perhaps
"wal_checksums.PID" or such (the PID would be that of the PG backend
that's doing the base backup, which we'd pass to START_REPLICATION).  Of
course, the backup process would have to check and make sure that it got
all the needed WAL file checksums, but since it knows the end, that
shouldn't be too bad.

> I also kinda disagree with the idea that the WAL should be considered
> an integral part of the backup. I don't know how pgbackrest does
> things, but BART stores each backup in a separate directly without any
> associated WAL, and then keeps all the WAL together in a different
> directory. I imagine that people who are using continuous archiving
> also tend to use -Xnone, or if they do backups by copying the files
> rather than using pg_backrest, they exclude pg_wal. In fact, for
> people with big, important databases, I'd assume that would be the
> normal pattern. You presumably wouldn't want to keep one copy of the
> WAL files taken during the backup with the backup itself, and a
> separate copy in the archive.

I really don't know what to say to this.  WAL is absolutely critical to
a backup being valid.  pgBackRest doesn't have a way to *just* validate
a backup today, unfortunately, but we're planning to support it in the
future and we will absolutely include in that validation checking all of
the WAL that's part of the backup.

I'm fine with forgoing all of this in the -X none case, as I've said
elsewhere.  I think it'd be great for pg_receivewal to have a way to
validate WAL and such, but that's a clearly new feature and it's
independent from validating a backup.

As it relates to how pgBackRest stores WAL, we actually do support both
of the options you mention, because people with big important databases
like to be extra paranoid.  WAL can either be stored in just the
archive, or it can be stored in both the archive and in the backup (with
'--archive-copy').  Note that this isn't done by just grabbing whatever
is in pg_wal at the time of the backup, as that wouldn't actually work,
but rather by copying the necessary WAL from the archive at the end of
the backup.

We do also check all WAL that's pulled from the archive by the restore
command, though exactly what WAL is needed isn't something we know ahead
of time (yet, anyway..  we are working on WAL parsing code that'll
change that by actually scanning the WAL and storing all restore points,
starting/ending times and transaction IDs, and anything else that can be
used as a restore target, so we can figure out exactly all WAL that's
needed to get to a particular restore target).

We actually have someone who implemented an independent tool called
check_pgbackrest which specifically has a "archives" check, for checking
that the WAL is in the archive.  We plan to also provide a way to ask
pgbackrest to confirm that there's no missing WAL, and that all of the
WAL is valid.

WAL is critical to a backup that's been taken in an online manner, no
matter where it's stored.  A backup isn't valid without the WAL that's
needed to reach consistency.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2020-03-27 14:34:19 -0400, Robert Haas wrote:
> > I think #2 is an interesting idea and could possibly reduce the danger
> > of user confusion on this point considerably - because, let's face it,
> > not everyone is going to read the documentation. However, I'm having a
> > hard time figuring out exactly what we'd print. Right now on success,
> > unless you specify -q, you get:
> >
> > [rhaas ~]$ pg_validatebackup  ~/pgslave
> > backup successfully verified
> >
> > But it feels strange and possibly confusing to me to print something like:
> >
> > [rhaas ~]$ pg_validatebackup  ~/pgslave
> > backup successfully verified (except for pg_wal)
>
> You could print something like:
> WAL necessary to restore this base backup can be validated with:
>
> pg_waldump -p ~/pgslave -t tl -s backup_start_location -e backup_end_loc > /dev/null && echo true
>
> Obviously that specific invocation sucks, but it'd not be hard to add an
> option to waldump to not output anything.

Interesting idea to use pg_waldump.

I had suggested up-thread, and I'm still fine with, having
pg_validatebackup scan the WAL and check the internal checksums.  I'd
prefer an option that uses hashes to check when the user has asked for
hashes with SHA256 or something, but at least scanning the WAL and
making sure it validates its internal checksum (and is actually all
there, which is pretty darn critical) would be enough to say that we're
pretty sure the backup is valid.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2020-03-27 15:20:27 -0400, Robert Haas wrote:
> > On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
> > > Hm. Should this warn if the directory's permissions are set too openly
> > > (world writable?)?
> >
> > I don't think so, but it's pretty clear that different people have
> > different ideas about what the scope of this tool ought to be, even in
> > this first version.
>
> Yea. I don't have a strong opinion on this specific issue. I was mostly
> wondering because I've repeatedly seen people restore backups with world
> readable properties, and with that it's obviously possible for somebody
> else to change the contents after the checksum was computed.

For my 2c, at least, I don't think we need to check the directory
permissions, but I wouldn't object to including a warning if they're set
such that PG won't start.  I suppose +0 for "warn if they are such that
PG won't start".

Thanks,

Stephen

Attachment

Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-27 17:44:07 -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2020-03-27 15:20:27 -0400, Robert Haas wrote:
> > > On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
> > > > Hm. Should this warn if the directory's permissions are set too openly
> > > > (world writable?)?
> > > 
> > > I don't think so, but it's pretty clear that different people have
> > > different ideas about what the scope of this tool ought to be, even in
> > > this first version.
> > 
> > Yea. I don't have a strong opinion on this specific issue. I was mostly
> > wondering because I've repeatedly seen people restore backups with world
> > readable properties, and with that it's obviously possible for somebody
> > else to change the contents after the checksum was computed.
> 
> For my 2c, at least, I don't think we need to check the directory
> permissions, but I wouldn't object to including a warning if they're set
> such that PG won't start.  I suppose +0 for "warn if they are such that
> PG won't start".

I was thinking of that check not being just at the top-level, but in
subdirectories too. It's easy to screw up the top and subdirectory
permissions in different ways, e.g. when manually creating the database
dir and then restoring a data directory directly into that.  IIRC
postmaster doesn't check that at start.


Greetings,

Andres Freund



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-27 17:07:42 -0400, Stephen Frost wrote:
> I had suggested up-thread, and I'm still fine with, having
> pg_validatebackup scan the WAL and check the internal checksums.  I'd
> prefer an option that uses hashes to check when the user has asked for
> hashes with SHA256 or something, but at least scanning the WAL and
> making sure it validates its internal checksum (and is actually all
> there, which is pretty darn critical) would be enough to say that we're
> pretty sure the backup is valid.

I'd say that actually parsing the WAL will give you a lot higher
confidence than verifying a sha256 for each file. There's plenty of ways
to screw up the pg_wal on the source server (I've seen several
restore_commands doing so, particularly when eagerly fetching). Sure,
it'll not help against an attacker, but I'm not sure I see the threat
model.

There's imo a cost argument against doing WAL verification by reading
it, but that'd mostly be a factor when comparing against a faster
whole-file checksum.

Greetings,

Andres Freund



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-27 16:57:46 -0400, Stephen Frost wrote:
> I really don't know what to say to this.  WAL is absolutely critical to
> a backup being valid.  pgBackRest doesn't have a way to *just* validate
> a backup today, unfortunately, but we're planning to support it in the
> future and we will absolutely include in that validation checking all of
> the WAL that's part of the backup.

Could you please address the fact that just about everybody uses base
backups + later WAL to have a short data loss window? Integrating the
WAL files necessary to make the base backup consistent doesn't achieve
much if we can't verify the WAL files afterwards. And fairly obviously
pg_basebackup can't do much about WAL created after its invocation.

Given that we need something separate to address that "verification
hole", I don't see why it's useful to have a special case solution (or
rather multiple ones, for stream and fetch) inside pg_basebackup.

Greetings,

Andres Freund



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2020-03-27 17:44:07 -0400, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > On 2020-03-27 15:20:27 -0400, Robert Haas wrote:
> > > > On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
> > > > > Hm. Should this warn if the directory's permissions are set too openly
> > > > > (world writable?)?
> > > >
> > > > I don't think so, but it's pretty clear that different people have
> > > > different ideas about what the scope of this tool ought to be, even in
> > > > this first version.
> > >
> > > Yea. I don't have a strong opinion on this specific issue. I was mostly
> > > wondering because I've repeatedly seen people restore backups with world
> > > readable properties, and with that it's obviously possible for somebody
> > > else to change the contents after the checksum was computed.
> >
> > For my 2c, at least, I don't think we need to check the directory
> > permissions, but I wouldn't object to including a warning if they're set
> > such that PG won't start.  I suppose +0 for "warn if they are such that
> > PG won't start".
>
> I was thinking of that check not being just at the top-level, but in
> subdirectories too. It's easy to screw up the top and subdirectory
> permissions in different ways, e.g. when manually creating the database
> dir and then restoring a data directory directly into that.  IIRC
> postmaster doesn't check that at start.

Yeah, I'm pretty sure we don't check that at postmaster start..  which
also means that we'll start up just fine even if the perms on
subdirectories are odd or wrong, unless maybe we end up in a really odd
state where a directory is 000'd or something.

Of course..  this is all a mess when it comes to pg_basebackup, really,
as previously discussed elsewhere, because what permissions and such you
end up with actually depends on what *format* you use with
pg_basebackup- it's different between 'tar' format and 'plain' format.
That is, if you use 'tar' format, and then actually use 'tar' to
extract, you get one set of privs, but if you use 'plain', you get
something different.

I mean..  pgBackRest sets all perms to whatever is in the manifest on
restore (or delta), but this patch doesn't include the permissions on
files, or ownership (something pgBackRest also tries to set, if
possible, on restore), does it...?  Doesn't look like it on a quick
look.  So if we want to compare to pgBackRest then, yes, we should
include the permissions in the manifest and we should check that
everything in the manifest matches what's on the filesystem.

I don't think we should just compare all permissions or ownership with
some arbitrary idea of what we think they should be, even though if you
use pg_basebackup in 'plain' format, you actually end up with
differences, today, from what the source system has.  In my view, that
should actually be fixed, to the extent possible.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Alvaro Herrera
Date:
On 2020-Mar-27, Stephen Frost wrote:

> I don't think we should just compare all permissions or ownership with
> some arbitrary idea of what we think they should be, even though if you
> use pg_basebackup in 'plain' format, you actually end up with
> differences, today, from what the source system has.  In my view, that
> should actually be fixed, to the extent possible.

I posted some thoughts about this at
https://www.postgresql.org/message-id/20190904201117.GA12986%40alvherre.pgsql
I didn't get time to work on that myself.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2020-03-27 16:57:46 -0400, Stephen Frost wrote:
> > I really don't know what to say to this.  WAL is absolutely critical to
> > a backup being valid.  pgBackRest doesn't have a way to *just* validate
> > a backup today, unfortunately, but we're planning to support it in the
> > future and we will absolutely include in that validation checking all of
> > the WAL that's part of the backup.
>
> Could you please address the fact that just about everybody uses base
> backups + later WAL to have a short data loss window? Integrating the
> WAL files necessary to make the base backup consistent doesn't achieve
> much if we can't verify the WAL files afterwards. And fairly obviously
> pg_basebackup can't do much about WAL created after its invocation.

I feel like we have very different ideas about what "just about
everybody" does here.  In my view, folks use pg_basebackup because it's
easy and they can create self-contained backups that include all the WAL
needed to get the backup up and running again and they don't typically
care about PITR all that much.  Folks who care about PITR use something
that manages WAL for them, which pg_basebackup and pg_receivewal really
don't do and it's not easy to add scripting around them to figure out
what WAL is needed for what backup, etc.

If we didn't think that the ability to create a self-contained backup
was useful, it sure seems odd that we've done a lot to make that work
(having both fetch and stream modes for it) and that it's the default.

> Given that we need something separate to address that "verification
> hole", I don't see why it's useful to have a special case solution (or
> rather multiple ones, for stream and fetch) inside pg_basebackup.

Well, the proposal up-thread would end up with almost zero changes to
pg_basebackup itself, but, yes, there'd be changes to BASE_BACKUP and
different ones for STREAMING_REPLICATION to support getting the WAL
checksums into the manifest.

Thanks,

Stephen

Attachment

Re: backup manifests

From
David Steele
Date:
On 3/27/20 6:07 PM, Andres Freund wrote:
> Hi,
> 
> On 2020-03-27 16:57:46 -0400, Stephen Frost wrote:
>> I really don't know what to say to this.  WAL is absolutely critical to
>> a backup being valid.  pgBackRest doesn't have a way to *just* validate
>> a backup today, unfortunately, but we're planning to support it in the
>> future and we will absolutely include in that validation checking all of
>> the WAL that's part of the backup.
> 
> Could you please address the fact that just about everybody uses base
> backups + later WAL to have a short data loss window? Integrating the
> WAL files necessary to make the base backup consistent doesn't achieve
> much if we can't verify the WAL files afterwards. And fairly obviously
> pg_basebackup can't do much about WAL created after its invocation.
> 
> Given that we need something separate to address that "verification
> hole", I don't see why it's useful to have a special case solution (or
> rather multiple ones, for stream and fetch) inside pg_basebackup.

There's a pretty big difference between not being able to play forward 
to the end of WAL and not being able to get the backup to restore to 
consistency at all.

The WAL that is generated during during the backup has special 
importance. Without it you have no backup at all.  It's the difference 
between *some* data loss and *total* data loss.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Bruce Momjian
Date:
On Thu, Mar 26, 2020 at 12:34:52PM -0400, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
> > This is where I feel like I'm trying to make decisions in a vacuum. If
> > we had a few more people weighing in on the thread on this point, I'd
> > be happy to go with whatever the consensus was. If most people think
> > having both --no-manifest (suppressing the manifest completely) and
> > --manifest-checksums=none (suppressing only the checksums) is useless
> > and confusing, then sure, let's rip the latter one out. If most people
> > like the flexibility, let's keep it: it's already implemented and
> > tested. But I hate to base the decision on what one or two people
> > think.
> 
> I'm frustrated at the lack of involvement from others also.

Well, the topic of backup manifests feels like it has generated a lot of
bickering emails, and people don't want to spend their time dealing with
that.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

On Fri, Mar 27, 2020 at 18:36 Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Mar 26, 2020 at 12:34:52PM -0400, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
> > This is where I feel like I'm trying to make decisions in a vacuum. If
> > we had a few more people weighing in on the thread on this point, I'd
> > be happy to go with whatever the consensus was. If most people think
> > having both --no-manifest (suppressing the manifest completely) and
> > --manifest-checksums=none (suppressing only the checksums) is useless
> > and confusing, then sure, let's rip the latter one out. If most people
> > like the flexibility, let's keep it: it's already implemented and
> > tested. But I hate to base the decision on what one or two people
> > think.
>
> I'm frustrated at the lack of involvement from others also.

Well, the topic of backup manifests feels like it has generated a lot of
bickering emails, and people don't want to spend their time dealing with
that.

I’d like to not also.  I suppose it’s just an area that I’m particularly concerned with that allows me to overcome that. Backups are important to me.

Thanks,

Stephen

Re: backup manifests

From
Bruce Momjian
Date:
On Fri, Mar 27, 2020 at 06:38:33PM -0400, Stephen Frost wrote:
> Greetings,
> 
> On Fri, Mar 27, 2020 at 18:36 Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Thu, Mar 26, 2020 at 12:34:52PM -0400, Stephen Frost wrote:
>     > * Robert Haas (robertmhaas@gmail.com) wrote:
>     > > This is where I feel like I'm trying to make decisions in a vacuum. If
>     > > we had a few more people weighing in on the thread on this point, I'd
>     > > be happy to go with whatever the consensus was. If most people think
>     > > having both --no-manifest (suppressing the manifest completely) and
>     > > --manifest-checksums=none (suppressing only the checksums) is useless
>     > > and confusing, then sure, let's rip the latter one out. If most people
>     > > like the flexibility, let's keep it: it's already implemented and
>     > > tested. But I hate to base the decision on what one or two people
>     > > think.
>     >
>     > I'm frustrated at the lack of involvement from others also.
> 
>     Well, the topic of backup manifests feels like it has generated a lot of
>     bickering emails, and people don't want to spend their time dealing with
>     that.
> 
> 
> I’d like to not also.  I suppose it’s just an area that I’m particularly
> concerned with that allows me to overcome that. Backups are important to me.

The big question is whether the discussion _needs_ to be that way.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: backup manifests

From
Noah Misch
Date:
On Fri, Mar 27, 2020 at 01:53:54PM -0400, Robert Haas wrote:
> - Replace a doc paragraph about the advantages and disadvantages of
> CRC-32C with one by Stephen Frost, with a slightly change by me that I
> thought made it sound more grammatical.

Defaulting to CRC-32C seems prudent to me:

- As Andres Freund said, SHA-512 is slow relative to storage now available.
  Since gzip is a needlessly-slow choice for backups (or any application that
  copies the compressed data just a few times), comparison to "gzip -6" speed
  is immaterial.

- While I'm sure some other fast hash would be a superior default, introducing
  a new algorithm is a bikeshed, as you said.  This design makes it easy,
  technically, for someone to introduce a new algorithm later.  CRC-32C is not
  catastrophically unfit for 1GiB files.

- Defaulting to SHA-512 would, in the absence of a WAL archive that also uses
  a cryptographic hash function, give a false sense of having achieved some
  coherent cryptographic goal.  With the CRC-32C default, WAL and the rest get
  similar protection.  I'm discounting the case of using BASE_BACKUP without a
  WAL archive, because I expect little intersection between sites "worried
  enough to hash everything" and those "not worried enough to use an archive".
  (On the other hand, the program that manages the WAL archive can reasonably
  own hashing base backups; putting ownership in the server isn't achieving
  much extra.)

> + <refnamediv>
> +  <refname>pg_validatebackup</refname>
> +  <refpurpose>verify the integrity of a base backup of a
> +  <productname>PostgreSQL</productname> cluster</refpurpose>
> + </refnamediv>

> +    <listitem>
> +      <para>
> +        <literal>pg_wal</literal> is ignored because WAL files are sent
> +        separately from the backup, and are therefore not described by the
> +        backup manifest.
> +      </para>
> +    </listitem>

Stephen Frost mentioned that a backup could pass validation even if
pg_basebackup were killed after writing the base backup and before finishing
the writing of pg_wal.  One might avoid that by simply writing the manifest to
a temporary name and renaming it to the final name after populating pg_wal.

What do you think of having the verification process also call pg_waldump to
validate the WAL CRCs (shown upthread)?  That looked helpful and simple.

I think this functionality doesn't belong in its own program.  If you suspect
pg_basebackup or pg_restore will eventually gain the ability to merge
incremental backups into a recovery-ready base backup, I would put the
functionality in that program.  Otherwise, I would put it in pg_checksums.
For me, part of the friction here is that the program description indicates
general verification, but the actual functionality merely checks hashes on a
directory tree that happens to represent a PostgreSQL base backup.

> +        parse->pathname = palloc(raw_length + 1);

I don't see this freed anywhere; is it?  (It's useful to make peak memory
consumption not grow in proportion to the number of files backed up.)

[This message is not a full code review.]



Re: backup manifests

From
Robert Haas
Date:
On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
> I don't like having a file format that's intended to be used by external
> tools too that's undocumented except for code that assembles it in a
> piecemeal fashion.  Do you mean in a follow-on patch this release, or
> later? I don't have a problem with the former.

This release. I'm happy to work on that as soon as this gets
committed, assuming it gets committed.

> I do found it to be circular. I think we mostly need a paragraph or two
> somewhere that explains on a higher level what the point of verifying
> base backups is and what is verified.

Fair enough.

> FWIW, I was thinking of backup_manifest.checksum potentially being
> desirable for another reason: The need to embed the checksum inside the
> document imo adds a fair bit of rigidity to the file format. See

Well, David Steele suggested this approach. I didn't particularly like
it, but nobody showed up to agree with me or propose anything
different, so here we are. I don't think it's the end of the world.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <noah@leadboat.com> wrote:
> Stephen Frost mentioned that a backup could pass validation even if
> pg_basebackup were killed after writing the base backup and before finishing
> the writing of pg_wal.  One might avoid that by simply writing the manifest to
> a temporary name and renaming it to the final name after populating pg_wal.

Huh, that's an idea. I'll have a look at the code and see what would
be involved.

> What do you think of having the verification process also call pg_waldump to
> validate the WAL CRCs (shown upthread)?  That looked helpful and simple.

I don't love calls to external binaries, but I think the thing that
really bothers me is that pg_waldump is practically bound to terminate
with an error, because the last WAL segment will end with a partial
record. For the same reason, I think there's really no such thing as
validating a single WAL file. I suppose you'd need to know the exact
start and end locations for a minimal WAL replay and check that all
records between those LSNs appear OK, ignoring any apparent problems
after the minimum ending point, or at least ignoring any problems due
to an incomplete record in the last file. We don't have a tool for
that currently, and I don't think I can write one this week. Or at
least, not a good one.

> I think this functionality doesn't belong in its own program.  If you suspect
> pg_basebackup or pg_restore will eventually gain the ability to merge
> incremental backups into a recovery-ready base backup, I would put the
> functionality in that program.  Otherwise, I would put it in pg_checksums.
> For me, part of the friction here is that the program description indicates
> general verification, but the actual functionality merely checks hashes on a
> directory tree that happens to represent a PostgreSQL base backup.

Suraj's original patch made this part of pg_basebackup, but I didn't
really like that, because I wanted it to have its own set of options.
I still think all the options I've added are pretty useful ones, and I
can think of other things somebody might want to do. It feels very
uncomfortable to make pg_basebackup, or pg_checksums, take either
options from set A and do thing X, or options from set B and do thing
Y. But it feels clear that the name pg_validatebackup is not going
over very well with anyone. I think I should rename it to
pg_validatemanifest.

> > +             parse->pathname = palloc(raw_length + 1);
>
> I don't see this freed anywhere; is it?  (It's useful to make peak memory
> consumption not grow in proportion to the number of files backed up.)

We need the hash table to remain populated for the whole run time of
the tool, because we're essentially doing a full join of the actual
directory contents against the manifest contents. That's a bit
unfortunate but it doesn't seem simple to improve. I think the only
people who are really going to suffer are people who have an enormous
pile of empty or nearly-empty relations. People who have large
databases for the normal reason - i.e. a reasonable number of tables
that hold a lot of data - will have manifests of very manageable size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Mar 27, 2020 at 4:02 PM David Steele <david@pgmasters.net> wrote:
> I prefer to validate the size and checksum in the same pass, but I'm not
> sure it's that big a deal.  If the backup is being corrupted under the
> validate process that would also apply to files that had already been
> validated.

I did it like this because I thought that in typical scenarios it
would be likely to produce useful results more quickly. For instance,
suppose that you forget to restore the tablespace directories, and
just get the main $PGDATA directory. Well, if you do it all in one
pass, you might spend a long time checksumming things before you
realize that some files are completely missing. I thought it would be
useful to complain about files that are extra or missing or the wrong
size FIRST, because that only requires us to stat() each file, and
only after that do the comparatively extensive checksumming step that
requires us to read the entire contents of each file. Granted, unless
you use --exit-on-error, you're going to get all the complaints
eventually anyway, but you might use that option, or you might hit ^C
when you start to see a slough of complaints poppoing out.

Maybe that was the wrong idea, but I thought people would like the
idea of running cheaper checks first. I wasn't worried about
concurrent modification of the backup because then you're super-hosed
no matter what.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
David Steele
Date:
On 3/29/20 8:33 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
> 
>> FWIW, I was thinking of backup_manifest.checksum potentially being
>> desirable for another reason: The need to embed the checksum inside the
>> document imo adds a fair bit of rigidity to the file format. See
> 
> Well, David Steele suggested this approach. I didn't particularly like
> it, but nobody showed up to agree with me or propose anything
> different, so here we are. I don't think it's the end of the world.

I prefer the embedded checksum even though it is a pain. It's a lot less 
likely to go missing.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
David Steele
Date:
On 3/29/20 8:42 PM, Robert Haas wrote:
> On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <noah@leadboat.com> wrote:
>> I don't see this freed anywhere; is it?  (It's useful to make peak memory
>> consumption not grow in proportion to the number of files backed up.)
> 
> We need the hash table to remain populated for the whole run time of
> the tool, because we're essentially doing a full join of the actual
> directory contents against the manifest contents. That's a bit
> unfortunate but it doesn't seem simple to improve. I think the only
> people who are really going to suffer are people who have an enormous
> pile of empty or nearly-empty relations. People who have large
> databases for the normal reason - i.e. a reasonable number of tables
> that hold a lot of data - will have manifests of very manageable size.

+1

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-29 20:47:40 -0400, Robert Haas wrote:
> Maybe that was the wrong idea, but I thought people would like the
> idea of running cheaper checks first. I wasn't worried about
> concurrent modification of the backup because then you're super-hosed
> no matter what.

I do like that approach.

To be clear: I'm suggesting the additional crosscheck not because I'm
not concerned with concurrent modifications, but because I've seen
filesystem per-inode metadata and the actual data / extent-tree
differ. Leading to EOF reported while reading at a different place than
what the size via stat() would indicate.

Greetings,

Andres Freund



Re: backup manifests

From
David Steele
Date:
On 3/29/20 8:47 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 4:02 PM David Steele <david@pgmasters.net> wrote:
>> I prefer to validate the size and checksum in the same pass, but I'm not
>> sure it's that big a deal.  If the backup is being corrupted under the
>> validate process that would also apply to files that had already been
>> validated.
> 
> I did it like this because I thought that in typical scenarios it
> would be likely to produce useful results more quickly. For instance,
> suppose that you forget to restore the tablespace directories, and
> just get the main $PGDATA directory. Well, if you do it all in one
> pass, you might spend a long time checksumming things before you
> realize that some files are completely missing. I thought it would be
> useful to complain about files that are extra or missing or the wrong
> size FIRST, because that only requires us to stat() each file, and
> only after that do the comparatively extensive checksumming step that
> requires us to read the entire contents of each file. Granted, unless
> you use --exit-on-error, you're going to get all the complaints
> eventually anyway, but you might use that option, or you might hit ^C
> when you start to see a slough of complaints poppoing out.

Yeah, that seems reasonable.

In our case backups are nearly always compressed and/or encrypted so 
even checking the original size is a bit of work. Getting the checksum 
at the same time seems like an obvious win.

Currently we don't have a separate validate command outside of restore 
but when we do we'll consider doing a pass to check for file presence 
(and size when possible) first. Thanks!

> I wasn't worried about
> concurrent modification of the backup because then you're super-hosed
> no matter what.

Really, really, super-hosed.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
> > What do you think of having the verification process also call pg_waldump to
> > validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
> 
> I don't love calls to external binaries, but I think the thing that
> really bothers me is that pg_waldump is practically bound to terminate
> with an error, because the last WAL segment will end with a partial
> record.

I don't think that's the case here. You should know the last required
record, which should allow to specify the precise end for pg_waldump. If
it errors out reading to that point, we'd be in trouble.


> For the same reason, I think there's really no such thing as
> validating a single WAL file. I suppose you'd need to know the exact
> start and end locations for a minimal WAL replay and check that all
> records between those LSNs appear OK, ignoring any apparent problems
> after the minimum ending point, or at least ignoring any problems due
> to an incomplete record in the last file. We don't have a tool for
> that currently, and I don't think I can write one this week. Or at
> least, not a good one.

pg_waldump -s / -e?


> > > +             parse->pathname = palloc(raw_length + 1);
> >
> > I don't see this freed anywhere; is it?  (It's useful to make peak memory
> > consumption not grow in proportion to the number of files backed up.)
> 
> We need the hash table to remain populated for the whole run time of
> the tool, because we're essentially doing a full join of the actual
> directory contents against the manifest contents. That's a bit
> unfortunate but it doesn't seem simple to improve. I think the only
> people who are really going to suffer are people who have an enormous
> pile of empty or nearly-empty relations. People who have large
> databases for the normal reason - i.e. a reasonable number of tables
> that hold a lot of data - will have manifests of very manageable size.

Given that that's a pre-existing issue - at a significantly larger scale
imo - e.g. for pg_dump (even in the --schema-only case), and that there
are tons of backend side issues with lots of relations too, I think
that's fine.

You could of course implement something merge-join like, and implement
the sorted input via a disk base sort. But that's a lot of work (good
luck making tuplesort work in the frontend...). So I'd not go there
unless there's a lot of evidence this is a serious practical issue.

If we find this use too much memory, I think we'd be better off
condensing pathnames into either fewer allocations, or a RelFileNode as
part of the struct (with a fallback to string for other types of
files). But I'd also not go there for now.

Greetings,

Andres Freund



Re: backup manifests

From
David Steele
Date:
On 3/29/20 9:07 PM, Andres Freund wrote:
> On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
>>> What do you think of having the verification process also call pg_waldump to
>>> validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
>>
>> I don't love calls to external binaries, but I think the thing that
>> really bothers me is that pg_waldump is practically bound to terminate
>> with an error, because the last WAL segment will end with a partial
>> record.
> 
> I don't think that's the case here. You should know the last required
> record, which should allow to specify the precise end for pg_waldump. If
> it errors out reading to that point, we'd be in trouble.

Exactly. All WAL generated during the backup should read fine with 
pg_waldump or there is a problem.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-29 21:23:06 -0400, David Steele wrote:
> On 3/29/20 9:07 PM, Andres Freund wrote:
> > On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
> > > > What do you think of having the verification process also call pg_waldump to
> > > > validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
> > > 
> > > I don't love calls to external binaries, but I think the thing that
> > > really bothers me is that pg_waldump is practically bound to terminate
> > > with an error, because the last WAL segment will end with a partial
> > > record.
> > 
> > I don't think that's the case here. You should know the last required
> > record, which should allow to specify the precise end for pg_waldump. If
> > it errors out reading to that point, we'd be in trouble.
> 
> Exactly. All WAL generated during the backup should read fine with
> pg_waldump or there is a problem.

See the attached minimal prototype for what I am thinking of.

This would not correctly handle the case where the timeline changes
while taking a base backup. But I'm not sure that'd be all that serious
a limitation for now?

I'd personally not want to use a base backup that included a timeline
switch...

Greetings,

Andres Freund

Attachment

Re: backup manifests

From
Noah Misch
Date:
On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote:
> On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <noah@leadboat.com> wrote:
> > I think this functionality doesn't belong in its own program.  If you suspect
> > pg_basebackup or pg_restore will eventually gain the ability to merge
> > incremental backups into a recovery-ready base backup, I would put the
> > functionality in that program.  Otherwise, I would put it in pg_checksums.
> > For me, part of the friction here is that the program description indicates
> > general verification, but the actual functionality merely checks hashes on a
> > directory tree that happens to represent a PostgreSQL base backup.
> 
> Suraj's original patch made this part of pg_basebackup, but I didn't
> really like that, because I wanted it to have its own set of options.
> I still think all the options I've added are pretty useful ones, and I
> can think of other things somebody might want to do. It feels very
> uncomfortable to make pg_basebackup, or pg_checksums, take either
> options from set A and do thing X, or options from set B and do thing
> Y.

pg_checksums does already have that property, for what it's worth.  (More
specifically, certain options dictate the mode, and it reports an error if
another option is incompatible with the mode.)

> But it feels clear that the name pg_validatebackup is not going
> over very well with anyone. I think I should rename it to
> pg_validatemanifest.

Between those two, I would use "pg_validatebackup" if there's a fair chance it
will end up doing the pg_waldump check.  Otherwise, I would use
"pg_validatemanifest".  I still most prefer delivering this as a mode of an
existing program.

> > > +             parse->pathname = palloc(raw_length + 1);
> >
> > I don't see this freed anywhere; is it?  (It's useful to make peak memory
> > consumption not grow in proportion to the number of files backed up.)
> 
> We need the hash table to remain populated for the whole run time of
> the tool, because we're essentially doing a full join of the actual
> directory contents against the manifest contents. That's a bit
> unfortunate but it doesn't seem simple to improve. I think the only
> people who are really going to suffer are people who have an enormous
> pile of empty or nearly-empty relations. People who have large
> databases for the normal reason - i.e. a reasonable number of tables
> that hold a lot of data - will have manifests of very manageable size.

Okay.



Re: backup manifests

From
Amit Kapila
Date:
On Mon, Mar 30, 2020 at 11:28 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote:
>
> > But it feels clear that the name pg_validatebackup is not going
> > over very well with anyone. I think I should rename it to
> > pg_validatemanifest.
>
> Between those two, I would use "pg_validatebackup" if there's a fair chance it
> will end up doing the pg_waldump check.  Otherwise, I would use
> "pg_validatemanifest".
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: backup manifests

From
Robert Haas
Date:
On Sun, Mar 29, 2020 at 10:08 PM Andres Freund <andres@anarazel.de> wrote:
> See the attached minimal prototype for what I am thinking of.
>
> This would not correctly handle the case where the timeline changes
> while taking a base backup. But I'm not sure that'd be all that serious
> a limitation for now?
>
> I'd personally not want to use a base backup that included a timeline
> switch...

Interesting concept. I've never (or almost never) used the -s and -e
options to pg_waldump, so I didn't think about using those. I think
having a --just-parse option to pg_waldump is a good idea, though
maybe not with that name e.g. we could call it --quiet.

It is less obvious to me what to do about all that as it pertains to
the current patch. If we want pg_validatebackup to run pg_waldump in
that mode or print out a hint about how to run pg_waldump in that
mode, it would need to obtain the relevant LSNs. I guess that would
require reading the backup_label file. It's not clear to me what we
would do if the backup crosses a timeline switch, assuming that's even
a case pg_basebackup allows. If we don't want to do anything in
pg_validatebackup automatically but just want to document this as a a
possible technique, we could finesse that problem with some
weasel-wording.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-30 14:35:40 -0400, Robert Haas wrote:
> On Sun, Mar 29, 2020 at 10:08 PM Andres Freund <andres@anarazel.de> wrote:
> > See the attached minimal prototype for what I am thinking of.
> >
> > This would not correctly handle the case where the timeline changes
> > while taking a base backup. But I'm not sure that'd be all that serious
> > a limitation for now?
> >
> > I'd personally not want to use a base backup that included a timeline
> > switch...
>
> Interesting concept. I've never (or almost never) used the -s and -e
> options to pg_waldump, so I didn't think about using those.

Oh - it's how I use it most of the time when investigating a specific
problem. I just about always use -s, and often -e. Besides just reducing
the logging output, and avoiding spurious errors, it makes it a lot
easier to iteratively expand the logging for records that are
problematic for the case at hand.


> I think
> having a --just-parse option to pg_waldump is a good idea, though
> maybe not with that name e.g. we could call it --quiet.

Yea, I didn't like the option's name. It's just the first thing that
came to mind.


> It is less obvious to me what to do about all that as it pertains to
> the current patch.

FWIW, I personally think we can live with this not validating WAL in the
first release. But I also think it'd be within reach to do better and
allow for WAL verification.


> If we want pg_validatebackup to run pg_waldump in that mode or print
> out a hint about how to run pg_waldump in that mode, it would need to
> obtain the relevant LSNs.

We could just include those in the manifest. Seems like good information
to have in there to me, as it allows to build the complete list of files
needed for a restore.


> It's not clear to me what we would do if the backup crosses a timeline
> switch, assuming that's even a case pg_basebackup allows.

I've not tested it, but it sure looks like it's possible. Both by having
a standby replaying from a node that promotes (multiple timeline
switches possible too, I think, if the WAL source follows timelines),
and by backing up from a standby that's being promoted.


> If we don't want to do anything in pg_validatebackup automatically but
> just want to document this as a a possible technique, we could finesse
> that problem with some weasel-wording.

It'd probably not be too hard to simply emit multiple commands, one for
each timeline "segment".

I wonder if it'd not be best, independent of whether we build in this
verification, to include that metadata in the manifest file. That's for
sure better than having to build a separate tool to parse timeline
history files.

I think it wouldn't be too hard to compute that information while taking
the base backup. We know the end timeline (ThisTimeLineID), so we can
just call readTimeLineHistory(ThisTimeLineID). Which should then allow
for something pretty trivial along the lines of

timelines = readTimeLineHistory(ThisTimeLineID);
last_start = InvalidXLogRecPtr;
foreach(lc, timelines)
{
    TimeLineHistoryEntry *he = lfirst(lc);

    if (he->end < startptr)
        continue;

    //
    manifest_emit_wal_range(Min(he->begin, startptr), he->end);
    last_start = he->end;
}

if (last_start == InvalidXlogRecPtr)
   start = startptr;
else
   start = last_start;

manifest_emit_wal_range(start, entptr);


Btw, just in case somebody suggests it: I don't think it's possible to
compute the WAL checksums at this point. In stream mode WAL very well
might already have been removed.

Greetings,

Andres Freund



Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 30, 2020 at 2:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Between those two, I would use "pg_validatebackup" if there's a fair chance it
> > will end up doing the pg_waldump check.  Otherwise, I would use
> > "pg_validatemanifest".
>
> +1.

I guess I'd like to be clear here that I have no fundamental
disagreement with taking this tool in any direction that people would
like it to go. For me it's just a question of timing. Feature freeze
is now a week or so away, and nothing complicated is going to get done
in that time. If we can all agree on something simple based on
Andres's recent proposal, cool, but I'm not yet sure that will be the
case, so what's plan B? We could decide that what I have here is just
too little to be a viable facility on its own, but I think Stephen is
the only one taking that position. We could release it as
pg_validatemanifest with a plan to rename it if other backup-related
checks are added later. We could release it as pg_validatebackup with
the idea to avoid having to rename it when more backup-related checks
are added later, but with a greater possibility of confusion in the
meantime and no hard guarantee that anyone will actually develop such
checks. We could put it in to pg_checksums, but I think that's really
backing ourselves into a corner: if backup validation develops other
checks that are not checksum-related, what then? I'd much rather
gamble on keeping things together by topic (backup) than technology
used internally (checksum). Putting it into pg_basebackup is another
option, and would avoid that problem, but it's not my preferred
option, because as I noted before, I think the command-line options
will get confusing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
> I guess I'd like to be clear here that I have no fundamental
> disagreement with taking this tool in any direction that people would
> like it to go. For me it's just a question of timing. Feature freeze
> is now a week or so away, and nothing complicated is going to get done
> in that time. If we can all agree on something simple based on
> Andres's recent proposal, cool, but I'm not yet sure that will be the
> case, so what's plan B? We could decide that what I have here is just
> too little to be a viable facility on its own, but I think Stephen is
> the only one taking that position. We could release it as
> pg_validatemanifest with a plan to rename it if other backup-related
> checks are added later. We could release it as pg_validatebackup with
> the idea to avoid having to rename it when more backup-related checks
> are added later, but with a greater possibility of confusion in the
> meantime and no hard guarantee that anyone will actually develop such
> checks. We could put it in to pg_checksums, but I think that's really
> backing ourselves into a corner: if backup validation develops other
> checks that are not checksum-related, what then? I'd much rather
> gamble on keeping things together by topic (backup) than technology
> used internally (checksum). Putting it into pg_basebackup is another
> option, and would avoid that problem, but it's not my preferred
> option, because as I noted before, I think the command-line options
> will get confusing.

I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
such. And eventually (definitely not this release) subsume pg_checksums
in it. That way we can add other checkers too.

I don't really see a point in ending up with lots of different commands
over time. Partially because there's probably plenty checks where the
overall cost can be drastically reduced by combining IO. Partially
because there's probably plenty shareable infrastructure. And partially
because I think it makes discovery for users a lot easier.

Greetings,

Andres Freund



Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 30, 2020 at 2:59 PM Andres Freund <andres@anarazel.de> wrote:
> I wonder if it'd not be best, independent of whether we build in this
> verification, to include that metadata in the manifest file. That's for
> sure better than having to build a separate tool to parse timeline
> history files.

I don't think that's better, or at least not "for sure better". The
backup_label going to include the START TIMELINE, and if -Xfetch is
used, we're also going to have all the timeline history files. If the
backup manifest includes those same pieces of information, then we've
got two sources of truth: one copy in the files the server's actually
going to read, and another copy in the backup_manifest which we're
going to potentially use for validation but ignore at runtime. That
seems not great.

> Btw, just in case somebody suggests it: I don't think it's possible to
> compute the WAL checksums at this point. In stream mode WAL very well
> might already have been removed.

Right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Mar 27, 2020 at 3:51 PM David Steele <david@pgmasters.net> wrote:
> There appear to be conflicts with 67e0adfb3f98:

Rebased.

>  > +          Specifies the algorithm that should be used to checksum
> each file
>  > +          for purposes of the backup manifest. Currently, the available
>
> perhaps "for inclusion in the backup manifest"?  Anyway, I think this
> sentence is awkward.

I changed it to "Specifies the checksum algorithm that should be
applied to each file included in the backup manifest." I hope that's
better. I also added, in both of the places where this text occurs, an
explanation a little higher up of what a backup manifest actually is.

>  > +        because the files themselves do not need to read.
>
> should be "need to be read".

Fixed.

>  > +        the manifest itself will always contain a
> <literal>SHA256</literal>
>
> I think just "the manifest will always contain" is fine.

OK.

>  > +        manifeste itself, and is therefore ignored. Note that the
> manifest
>
> typo "manifeste", perhaps remove itself.

OK, fixed.

>  > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27
> 18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" },
>
> Storing the checksum type with each file seems pretty redundant.
> Perhaps that could go in the header?  You could always override if a
> specific file had a different checksum type, though that seems unlikely.
>
> In general it might be good to go with shorter keys: "mod", "chk", etc.
> Manifests can get pretty big and that's a lot of extra bytes.
>
> I'm also partial to using epoch time in the manifest because it is
> generally easier for programs to work with.  But, human-readable doesn't
> suck, either.

It doesn't seem impossible for it to come up; for example, consider a
file-level incremental backup facility. You might retain whatever
checksums you have for the unchanged files (to avoid rereading them)
and add checksums for modified or added files.

I am not convinced that minimizing the size of the file here is a
particularly important goal, because I don't think it's going to get
that big in normal cases. I also think having the keys and values be
easily understandable by human being is a plus. If we really want a
minimal format without redundancy, we should've gone with what I
proposed before (though admittedly that could've been tamped down even
further if we'd cared to squeeze, which I didn't think was important
then either).

>
>  >      if (maxrate > 0)
>  >              maxrate_clause = psprintf("MAX_RATE %u", maxrate);
>  > +    if (manifest)
>
> A linefeed here would be nice.

Added.

>  > +    manifestfile *tabent;
>
> This is an odd name.  A holdover from the tab-delimited version?

No, it was meant to stand for table entry. (Now we find out what
happens when I break my own rule against using abbreviated words.)

>  > +    printf(_("Usage:\n  %s [OPTION]... BACKUPDIR\n\n"), progname);
>
> When I ran pg_validatebackup I expected to use -D to specify the backup
> dir since pg_basebackup does.  On the other hand -D is weird because I
> *really* expect that to be the pg data dir.
>
> But, do we want this to be different from pg_basebackup?

I think it's pretty distinguishable, because pg_basebackup needs an
input (server) and an output (directory), whereas pg_validatebackup
only needs one. I don't really care if we want to change it, but I was
thinking of this as being more analogous to, say, pg_resetwal.
Granted, that's a danger-don't-use-this tool and this isn't, but I
don't think we want the -D-is-optional behavior that tools like pg_ctl
have, because having a tool that isn't supposed to be used on a
running cluster default to $PGDATA seems inadvisable. And if the
argument is mandatory then it's not clear to me why we should make
people type -D in front of it.

>  > +            checksum_length = checksum_string_length / 2;
>
> This check is defeated if a single character is added the to checksum.
>
> Not too big a deal since you still get an error, but still.

I don't see what the problem is here. We speculatively divide by two
and allocate memory assuming the value that it was even, but then
before doing anything critical we bail out if it was actually odd.
That's harmless. We could get around it by saying:

if (checksum_string_length % 2 != 0)
    context->error_cb(...);
checksum_length = checksum_string_length / 2;
checksum_payload = palloc(checksum_length);
if (!hexdecode_string(...))
    context->error_cb(...);

...but that would be adding additional code, and error messages, for
what's basically a can't-happen-unless-the-user-is-messing-with-us
case.

>  > + * Verify that the manifest checksum is correct.
>
> This is not working the way I would expect -- I could freely modify the
> manifest without getting a checksum error on the manifest.  For example:
>
> $ /home/vagrant/test/pg/bin/pg_validatebackup test/backup3
> pg_validatebackup: fatal: invalid checksum for file "backup_label":
> "408901e0814f40f8ceb7796309a59c7248458325a21941e7c55568e381f53831?"
>
> So, if I deleted the entry above, I got a manifest checksum error.  But
> if I just modified the checksum I get a file checksum error with no
> manifest checksum error.
>
> I would prefer a manifest checksum error in all cases where it is wrong,
> unless --exit-on-error is specified.

I think I would too, but I'm confused as to what you're doing, because
if I just modified the manifest -- by deleting a file, for example, or
changing the checksum of a file, I just get:

pg_validatebackup: fatal: manifest checksum mismatch

I'm confused as to why you're not seeing that. What's the exact
sequence of steps?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Sun, Mar 29, 2020 at 9:05 PM David Steele <david@pgmasters.net> wrote:
> Yeah, that seems reasonable.
>
> In our case backups are nearly always compressed and/or encrypted so
> even checking the original size is a bit of work. Getting the checksum
> at the same time seems like an obvious win.

Makes sense. If this even got extended so it could read from tar-files
instead of the filesystem directly, we'd surely want to take the
opposite approach and just make a single pass. I'm not sure whether
it's worth doing that at some point in the future, but it might be. If
we're going to add the capability to compress or encrypt backups to
pg_basebackup, we might want to do that first, and then make this tool
handle all of those formats in one go.

(As always, I don't have the ability to control how arbitrary
developers spend their development time... so this is just a thought.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-30 15:23:08 -0400, Robert Haas wrote:
> On Mon, Mar 30, 2020 at 2:59 PM Andres Freund <andres@anarazel.de> wrote:
> > I wonder if it'd not be best, independent of whether we build in this
> > verification, to include that metadata in the manifest file. That's for
> > sure better than having to build a separate tool to parse timeline
> > history files.
> 
> I don't think that's better, or at least not "for sure better". The
> backup_label going to include the START TIMELINE, and if -Xfetch is
> used, we're also going to have all the timeline history files. If the
> backup manifest includes those same pieces of information, then we've
> got two sources of truth: one copy in the files the server's actually
> going to read, and another copy in the backup_manifest which we're
> going to potentially use for validation but ignore at runtime. That
> seems not great.

The data in the backup label isn't sufficient though. Without having
parsed the timeline file there's no way to verify that the correct WAL
is present. I guess we can also add client side tools to parse
timelines, add command the fetch all of the required files, and then
interpret that somehow.

But that seems much more complicated.

Imo it makes sense to want to be able verify that WAL looks correct even
transporting WAL using another method (say archiving) and thus using
pg_basebackup's -Xnone.

For the manifest to actually list what's required for the base backup
doesn't seem redundant to me. Imo it makes the manifest file make a good
bit more sense, since afterwards it actually describes the whole base
backup.

Taking the redundancy agreement a bit further you can argue that we
don't need a list of relation files at all, since they're in the catalog
:P. Obviously going to that extreme doesn't make all that much
sense... But I do think it's a second source of truth that's independent
of what the backends actually are going to read.

Greetings,

Andres Freund



Re: backup manifests

From
David Steele
Date:
On 3/30/20 5:08 PM, Andres Freund wrote:
> 
> The data in the backup label isn't sufficient though. Without having
> parsed the timeline file there's no way to verify that the correct WAL
> is present. I guess we can also add client side tools to parse
> timelines, add command the fetch all of the required files, and then
> interpret that somehow.
> 
> But that seems much more complicated.
> 
> Imo it makes sense to want to be able verify that WAL looks correct even
> transporting WAL using another method (say archiving) and thus using
> pg_basebackup's -Xnone.
> 
> For the manifest to actually list what's required for the base backup
> doesn't seem redundant to me. Imo it makes the manifest file make a good
> bit more sense, since afterwards it actually describes the whole base
> backup.

FWIW, pgBackRest stores the backup WAL stop/start in the manifest. To 
get this information after the backup is complete requires parsing the 
.backup file which doesn't get stored in the backup directory by 
pg_basebackup. As far as I know, this is only accessibly to solutions 
that implement archive_command. So, pgBackRest could do that but it 
seems far more trouble than it is worth.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
David Steele
Date:
On 3/30/20 4:16 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 3:51 PM David Steele <david@pgmasters.net> wrote:
> 
>>   > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27
>> 18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" },
>>
>> Storing the checksum type with each file seems pretty redundant.
>> Perhaps that could go in the header?  You could always override if a
>> specific file had a different checksum type, though that seems unlikely.
>>
>> In general it might be good to go with shorter keys: "mod", "chk", etc.
>> Manifests can get pretty big and that's a lot of extra bytes.
>>
>> I'm also partial to using epoch time in the manifest because it is
>> generally easier for programs to work with.  But, human-readable doesn't
>> suck, either.
> 
> It doesn't seem impossible for it to come up; for example, consider a
> file-level incremental backup facility. You might retain whatever
> checksums you have for the unchanged files (to avoid rereading them)
> and add checksums for modified or added files.

OK.

> I am not convinced that minimizing the size of the file here is a
> particularly important goal, because I don't think it's going to get
> that big in normal cases. I also think having the keys and values be
> easily understandable by human being is a plus. If we really want a
> minimal format without redundancy, we should've gone with what I
> proposed before (though admittedly that could've been tamped down even
> further if we'd cared to squeeze, which I didn't think was important
> then either).

Well, normal cases is the key.  But fine, in general we have found that 
the in memory representation is more important in terms of supporting 
clusters with very large numbers of files.

>> When I ran pg_validatebackup I expected to use -D to specify the backup
>> dir since pg_basebackup does.  On the other hand -D is weird because I
>> *really* expect that to be the pg data dir.
>>
>> But, do we want this to be different from pg_basebackup?
> 
> I think it's pretty distinguishable, because pg_basebackup needs an
> input (server) and an output (directory), whereas pg_validatebackup
> only needs one. I don't really care if we want to change it, but I was
> thinking of this as being more analogous to, say, pg_resetwal.
> Granted, that's a danger-don't-use-this tool and this isn't, but I
> don't think we want the -D-is-optional behavior that tools like pg_ctl
> have, because having a tool that isn't supposed to be used on a
> running cluster default to $PGDATA seems inadvisable. And if the
> argument is mandatory then it's not clear to me why we should make
> people type -D in front of it.

Honestly I think pg_basebackup is the confusing one, because in most 
cases -D points at the running cluster dir. So, OK.

>>   > +            checksum_length = checksum_string_length / 2;
>>
>> This check is defeated if a single character is added the to checksum.
>>
>> Not too big a deal since you still get an error, but still.
> 
> I don't see what the problem is here. We speculatively divide by two
> and allocate memory assuming the value that it was even, but then
> before doing anything critical we bail out if it was actually odd.
> That's harmless. We could get around it by saying:
> 
> if (checksum_string_length % 2 != 0)
>      context->error_cb(...);
> checksum_length = checksum_string_length / 2;
> checksum_payload = palloc(checksum_length);
> if (!hexdecode_string(...))
>      context->error_cb(...);
> 
> ...but that would be adding additional code, and error messages, for
> what's basically a can't-happen-unless-the-user-is-messing-with-us
> case.

Sorry, pasted the wrong code and even then still didn't get it quite 
right.

The problem:

If I remove an even characters from a checksum it appears the checksum 
passes but the manifest checksum fails:

$ pg_basebackup -D test/backup5 --manifest-checksums=SHA256

$ vi test/backup5/backup_manifest
     * Remove two characters from the checksum of backup_label

$ pg_validatebackup test/backup5

pg_validatebackup: fatal: manifest checksum mismatch

But if I add any number of characters or remove an odd number of 
characters I get:

pg_validatebackup: fatal: invalid checksum for file "backup_label": 
"a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fXX"

and no manifest checksum failure.

>>   > + * Verify that the manifest checksum is correct.
>>
>> This is not working the way I would expect -- I could freely modify the
>> manifest without getting a checksum error on the manifest.  For example:
>>
>> $ /home/vagrant/test/pg/bin/pg_validatebackup test/backup3
>> pg_validatebackup: fatal: invalid checksum for file "backup_label":
>> "408901e0814f40f8ceb7796309a59c7248458325a21941e7c55568e381f53831?"
>>
>> So, if I deleted the entry above, I got a manifest checksum error.  But
>> if I just modified the checksum I get a file checksum error with no
>> manifest checksum error.
>>
>> I would prefer a manifest checksum error in all cases where it is wrong,
>> unless --exit-on-error is specified.
> 
> I think I would too, but I'm confused as to what you're doing, because
> if I just modified the manifest -- by deleting a file, for example, or
> changing the checksum of a file, I just get:
> 
> pg_validatebackup: fatal: manifest checksum mismatch
> 
> I'm confused as to why you're not seeing that. What's the exact
> sequence of steps?

$ pg_basebackup -D test/backup5 --manifest-checksums=SHA256

$ vi test/backup5/backup_manifest
     * Add 'X' to the checksum of backup_label

$ pg_validatebackup test/backup5
pg_validatebackup: fatal: invalid checksum for file "backup_label": 
"a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fX"

No mention of the manifest checksum being invalid.  But if I remove the 
backup label file from the manifest:

pg_validatebackup: fatal: manifest checksum mismatch

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Noah Misch
Date:
On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote:
> On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
> > I guess I'd like to be clear here that I have no fundamental
> > disagreement with taking this tool in any direction that people would
> > like it to go. For me it's just a question of timing. Feature freeze
> > is now a week or so away, and nothing complicated is going to get done
> > in that time. If we can all agree on something simple based on
> > Andres's recent proposal, cool, but I'm not yet sure that will be the
> > case, so what's plan B? We could decide that what I have here is just
> > too little to be a viable facility on its own, but I think Stephen is
> > the only one taking that position. We could release it as
> > pg_validatemanifest with a plan to rename it if other backup-related
> > checks are added later. We could release it as pg_validatebackup with
> > the idea to avoid having to rename it when more backup-related checks
> > are added later, but with a greater possibility of confusion in the
> > meantime and no hard guarantee that anyone will actually develop such
> > checks. We could put it in to pg_checksums, but I think that's really
> > backing ourselves into a corner: if backup validation develops other
> > checks that are not checksum-related, what then? I'd much rather
> > gamble on keeping things together by topic (backup) than technology
> > used internally (checksum). Putting it into pg_basebackup is another
> > option, and would avoid that problem, but it's not my preferred
> > option, because as I noted before, I think the command-line options
> > will get confusing.
> 
> I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
> such. And eventually (definitely not this release) subsume pg_checksums
> in it. That way we can add other checkers too.

Works for me; of those two, I prefer pg_validate.



Re: backup manifests

From
Amit Kapila
Date:
On Tue, Mar 31, 2020 at 11:10 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote:
> > On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
> > > I guess I'd like to be clear here that I have no fundamental
> > > disagreement with taking this tool in any direction that people would
> > > like it to go. For me it's just a question of timing. Feature freeze
> > > is now a week or so away, and nothing complicated is going to get done
> > > in that time. If we can all agree on something simple based on
> > > Andres's recent proposal, cool, but I'm not yet sure that will be the
> > > case, so what's plan B? We could decide that what I have here is just
> > > too little to be a viable facility on its own, but I think Stephen is
> > > the only one taking that position. We could release it as
> > > pg_validatemanifest with a plan to rename it if other backup-related
> > > checks are added later. We could release it as pg_validatebackup with
> > > the idea to avoid having to rename it when more backup-related checks
> > > are added later, but with a greater possibility of confusion in the
> > > meantime and no hard guarantee that anyone will actually develop such
> > > checks. We could put it in to pg_checksums, but I think that's really
> > > backing ourselves into a corner: if backup validation develops other
> > > checks that are not checksum-related, what then? I'd much rather
> > > gamble on keeping things together by topic (backup) than technology
> > > used internally (checksum). Putting it into pg_basebackup is another
> > > option, and would avoid that problem, but it's not my preferred
> > > option, because as I noted before, I think the command-line options
> > > will get confusing.
> >
> > I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
> > such. And eventually (definitely not this release) subsume pg_checksums
> > in it. That way we can add other checkers too.
>
> Works for me; of those two, I prefer pg_validate.
>

pg_validate sounds like a tool with a much bigger purpose.  I think
even things like amcheck could also fall under it.

This patch has two parts (a) Generate backup manifests for base
backups, and (b) Validate backup (manifest).  It seems to me that
there are not many things pending for (a), can't we commit that first
or is it the case that (a) depends on (b)?  This is *not* a suggestion
to leave pg_validatebackup from this release rather just to commit if
something is ready and meaningful on its own.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 30, 2020 at 7:24 PM David Steele <david@pgmasters.net> wrote:
> > I'm confused as to why you're not seeing that. What's the exact
> > sequence of steps?
>
> $ pg_basebackup -D test/backup5 --manifest-checksums=SHA256
>
> $ vi test/backup5/backup_manifest
>      * Add 'X' to the checksum of backup_label
>
> $ pg_validatebackup test/backup5
> pg_validatebackup: fatal: invalid checksum for file "backup_label":
> "a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fX"
>
> No mention of the manifest checksum being invalid.  But if I remove the
> backup label file from the manifest:
>
> pg_validatebackup: fatal: manifest checksum mismatch

Oh, I see what's happening now. If the checksum is not an even-length
string of hexademical characters, it's treated as a syntax error, so
it bails out at that point. Generally, a syntax error in the manifest
file is treated as a fatal error, and you just die right there. You'd
get the same behavior if you had malformed JSON, like a stray { or }
or [ or ] someplace that it doesn't belong according to the rules of
JSON. On the other hand, if you corrupt the checksum by adding AA or
EE or 54 or some other even-length string of hex characters, then you
have (in this code's view) a semantic error rather than a syntax
error, so it will finish loading all the manifest data and then bail
because the checksum doesn't match.

We really can't avoid bailing out early sometimes, because if the file
is totally malformed at the JSON level, there's just no way to
continue. We could cause this particular error to get treated as a
semantic error rather than a syntax error, but I don't really see much
advantage in so doing. This way was easier to code, and I don't think
it really matters which error we find first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Amit Kapila (amit.kapila16@gmail.com) wrote:
> On Tue, Mar 31, 2020 at 11:10 AM Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote:
> > > On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
> > > > I guess I'd like to be clear here that I have no fundamental
> > > > disagreement with taking this tool in any direction that people would
> > > > like it to go. For me it's just a question of timing. Feature freeze
> > > > is now a week or so away, and nothing complicated is going to get done
> > > > in that time. If we can all agree on something simple based on
> > > > Andres's recent proposal, cool, but I'm not yet sure that will be the
> > > > case, so what's plan B? We could decide that what I have here is just
> > > > too little to be a viable facility on its own, but I think Stephen is
> > > > the only one taking that position. We could release it as
> > > > pg_validatemanifest with a plan to rename it if other backup-related
> > > > checks are added later. We could release it as pg_validatebackup with
> > > > the idea to avoid having to rename it when more backup-related checks
> > > > are added later, but with a greater possibility of confusion in the
> > > > meantime and no hard guarantee that anyone will actually develop such
> > > > checks. We could put it in to pg_checksums, but I think that's really
> > > > backing ourselves into a corner: if backup validation develops other
> > > > checks that are not checksum-related, what then? I'd much rather
> > > > gamble on keeping things together by topic (backup) than technology
> > > > used internally (checksum). Putting it into pg_basebackup is another
> > > > option, and would avoid that problem, but it's not my preferred
> > > > option, because as I noted before, I think the command-line options
> > > > will get confusing.
> > >
> > > I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
> > > such. And eventually (definitely not this release) subsume pg_checksums
> > > in it. That way we can add other checkers too.
> >
> > Works for me; of those two, I prefer pg_validate.
>
> pg_validate sounds like a tool with a much bigger purpose.  I think
> even things like amcheck could also fall under it.

Yeah, I tend to agree with this.

> This patch has two parts (a) Generate backup manifests for base
> backups, and (b) Validate backup (manifest).  It seems to me that
> there are not many things pending for (a), can't we commit that first
> or is it the case that (a) depends on (b)?  This is *not* a suggestion
> to leave pg_validatebackup from this release rather just to commit if
> something is ready and meaningful on its own.

I suspect the idea here is that we don't really want to commit something
that nothing is actually using, and that's understandable and justified
here- consider that even in this recent discussion there was talk that
maybe we should have included permissions and ownership in the manifest,
or starting and ending WAL positions, so that they'd be able to be
checked by this tool more easily (and because it's just useful to have
all that info in one place...  I don't really agree with the concerns
that it's an issue for static information like that to be duplicated).

In other words, while the manifest creation code might be something we
could commit, without a tool to use it (which does all the things that
we think it needs to, to perform some high-level task, such as "validate
a backup") we don't know that the manifest that's actually generated is
really up to snuff and has what it needs to have to perform that task.

I had been hoping that the discussion Andres was leading regarding
leveraging pg_waldump (or maybe just code from it..) would get us to a
point where pg_validatebackup would check that we have all of the WAL
needed for the backup to be consistent and that it would then verify the
internal checksums of the WAL.  That would certainly be a good solution
for this time around, in my view, and is already all existing
client-side code.  I do think we'd want to have a note about how we
verify pg_wal differently from the other files which are in the
manifest.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Mon, Mar 30, 2020 at 2:59 PM Andres Freund <andres@anarazel.de> wrote:
> I think it wouldn't be too hard to compute that information while taking
> the base backup. We know the end timeline (ThisTimeLineID), so we can
> just call readTimeLineHistory(ThisTimeLineID). Which should then allow
> for something pretty trivial along the lines of
>
> timelines = readTimeLineHistory(ThisTimeLineID);
> last_start = InvalidXLogRecPtr;
> foreach(lc, timelines)
> {
>     TimeLineHistoryEntry *he = lfirst(lc);
>
>     if (he->end < startptr)
>         continue;
>
>     //
>     manifest_emit_wal_range(Min(he->begin, startptr), he->end);
>     last_start = he->end;
> }
>
> if (last_start == InvalidXlogRecPtr)
>    start = startptr;
> else
>    start = last_start;
>
> manifest_emit_wal_range(start, entptr);

I made an attempt to implement this. In the attached patch set, 0001
and 0002 are (I think) unmodified from the last version. 0003 is a
slightly-rejiggered version of your new pg_waldump option. 0004 whacks
0002 around so that the WAL ranges are included in the manifest and
pg_validatebackup tries to run pg_waldump for each WAL range. It
appears to work in light testing, but I haven't yet (1) tested it
extensively, (2) written good regression tests for it above and beyond
what pg_validatebackup had already, or (3) updated the documentation.
I'm going to work on those things. I would appreciate *very timely*
feedback on anything people do or do not like about this, because I
want to commit this patch set by the end of the work week and that
isn't very far away. I would also appreciate if people would bear in
mind the principle that half a loaf is better than none, and further
improvements can be made in future releases.

As part of my light testing, I tried promoting a standby that was
running pg_basebackup, and found that pg_basebackup failed like this:

pg_basebackup: error: could not get COPY data stream: ERROR:  the
standby was promoted during online backup
HINT:  This means that the backup being taken is corrupt and should
not be used. Try taking another online backup.
pg_basebackup: removing data directory "/Users/rhaas/pgslave2"

My first thought was that this error message is hard to reconcile with
this comment:

        /*
         * Send timeline history files too. Only the latest timeline history
         * file is required for recovery, and even that only if there happens
         * to be a timeline switch in the first WAL segment that contains the
         * checkpoint record, or if we're taking a base backup from a standby
         * server and the target timeline changes while the backup is taken.
         * But they are small and highly useful for debugging purposes, so
         * better include them all, always.
         */

But then it occurred to me that this might be a cascading standby.
Maybe the original master died and this machine's master got promoted,
so it has to follow a timeline switch but doesn't itself get promoted.
I think I might try to test out that scenario and see what happens,
but I haven't done so as of this writing. Regardless, it seems like a
really good idea to store a list of WAL ranges rather than a single
start/end/timeline, because even if it's impossible today it might
become possible in the future. Still, unless there's an easy way to
set up a test scenario where multiple WAL ranges need to be verified,
it may be hard to test that this code actually behaves properly.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
> I made an attempt to implement this.

Awesome!


> In the attached patch set, 0001 I'm going to work on those things. I
> would appreciate *very timely* feedback on anything people do or do
> not like about this, because I want to commit this patch set by the
> end of the work week and that isn't very far away. I would also
> appreciate if people would bear in mind the principle that half a loaf
> is better than none, and further improvements can be made in future
> releases.
> 
> As part of my light testing, I tried promoting a standby that was
> running pg_basebackup, and found that pg_basebackup failed like this:
> 
> pg_basebackup: error: could not get COPY data stream: ERROR:  the
> standby was promoted during online backup
> HINT:  This means that the backup being taken is corrupt and should
> not be used. Try taking another online backup.
> pg_basebackup: removing data directory "/Users/rhaas/pgslave2"
> 
> My first thought was that this error message is hard to reconcile with
> this comment:
> 
>         /*
>          * Send timeline history files too. Only the latest timeline history
>          * file is required for recovery, and even that only if there happens
>          * to be a timeline switch in the first WAL segment that contains the
>          * checkpoint record, or if we're taking a base backup from a standby
>          * server and the target timeline changes while the backup is taken.
>          * But they are small and highly useful for debugging purposes, so
>          * better include them all, always.
>          */
> 
> But then it occurred to me that this might be a cascading standby.

Yea. The check just prevents the walsender's database from being
promoted:

        /*
         * Check if the postmaster has signaled us to exit, and abort with an
         * error in that case. The error handler further up will call
         * do_pg_abort_backup() for us. Also check that if the backup was
         * started while still in recovery, the server wasn't promoted.
         * do_pg_stop_backup() will check that too, but it's better to stop
         * the backup early than continue to the end and fail there.
         */
        CHECK_FOR_INTERRUPTS();
        if (RecoveryInProgress() != backup_started_in_recovery)
            ereport(ERROR,
                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                     errmsg("the standby was promoted during online backup"),
                     errhint("This means that the backup being taken is corrupt "
                             "and should not be used. "
                             "Try taking another online backup.")));
and

    if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
        ereport(ERROR,
                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                 errmsg("the standby was promoted during online backup"),
                 errhint("This means that the backup being taken is corrupt "
                         "and should not be used. "
                         "Try taking another online backup.")));

So that just prevents promotions of the current node, afaict.



> Regardless, it seems like a really good idea to store a list of WAL
> ranges rather than a single start/end/timeline, because even if it's
> impossible today it might become possible in the future.

Indeed.


> Still, unless there's an easy way to set up a test scenario where
> multiple WAL ranges need to be verified, it may be hard to test that
> this code actually behaves properly.

I think it'd be possible to test without a fully cascading setup, by
creating an initial base backup, then do some work to create a bunch of
new timelines, and then start the initial base backup. That'd have to
follow all those timelines.  Not sure that's better than a cascading
setup though.


> +/*
> + * Add information about the WAL that will need to be replayed when restoring
> + * this backup to the manifest.
> + */
> +static void
> +AddWALInfoToManifest(manifest_info *manifest, XLogRecPtr startptr,
> +                     TimeLineID starttli, XLogRecPtr endptr, TimeLineID endtli)
> +{
> +    List *timelines = readTimeLineHistory(endtli);

should probably happen after the manifest->buffile check.


> +    ListCell *lc;
> +    bool    first_wal_range = true;
> +    bool    found_ending_tli = false;
> +
> +    /* If there is no buffile, then the user doesn't want a manifest. */
> +    if (manifest->buffile == NULL)
> +        return;

Not really about this patch/function specifically: I wonder if this'd
look better if you added ManifestEnabled() macro instead of repeating
the comment repeatedly.



> +    /* Unless --no-parse-wal was specified, we will need pg_waldump. */
> +    if (!no_parse_wal)
> +    {
> +        int        ret;
> +
> +        pg_waldump_path = pg_malloc(MAXPGPATH);
> +        ret = find_other_exec(argv[0], "pg_waldump",
> +                              "pg_waldump (PostgreSQL) " PG_VERSION "\n",
> +                             pg_waldump_path);
> +        if (ret < 0)
> +        {
> +            char    full_path[MAXPGPATH];
> +
> +            if (find_my_exec(argv[0], full_path) < 0)
> +                strlcpy(full_path, progname, sizeof(full_path));
> +            if (ret == -1)
> +                pg_log_fatal("The program \"%s\" is needed by %s but was\n"
> +                             "not found in the same directory as \"%s\".\n"
> +                             "Check your installation.",
> +                             "pg_waldump", "pg_validatebackup", full_path);
> +            else
> +                pg_log_fatal("The program \"%s\" was found by \"%s\" but was\n"
> +                             "not the same version as %s.\n"
> +                             "Check your installation.",
> +                             "pg_waldump", full_path, "pg_validatebackup");
> +        }
> +    }

ISTM, and this can definitely wait for another time, that we should have
one wrapper doing all of this, instead of having quite a few copies of
very similar logic to the above.


> +/*
> + * Attempt to parse the WAL files required to restore from backup using
> + * pg_waldump.
> + */
> +static void
> +parse_required_wal(validator_context *context, char *pg_waldump_path,
> +                   char *wal_directory, manifest_wal_range *first_wal_range)
> +{
> +    manifest_wal_range *this_wal_range = first_wal_range;
> +
> +    while (this_wal_range != NULL)
> +    {
> +        char *pg_waldump_cmd;
> +
> +        pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%X --end=%X/%X\n",
> +               pg_waldump_path, wal_directory, this_wal_range->tli,
> +               (uint32) (this_wal_range->start_lsn >> 32),
> +               (uint32) this_wal_range->start_lsn,
> +               (uint32) (this_wal_range->end_lsn >> 32),
> +               (uint32) this_wal_range->end_lsn);
> +        if (system(pg_waldump_cmd) != 0)
> +            report_backup_error(context,
> +                                "WAL parsing failed for timeline %u",
> +                                this_wal_range->tli);
> +
> +        this_wal_range = this_wal_range->next;
> +    }
> +}

Should we have a function to properly escape paths in cases like this?
Not that it's likely or really problematic, but the quoting for path
could be "circumvented".


Greetings,

Andres Freund



Re: backup manifests

From
Noah Misch
Date:
On Tue, Mar 31, 2020 at 03:50:34PM -0700, Andres Freund wrote:
> On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
> > +/*
> > + * Attempt to parse the WAL files required to restore from backup using
> > + * pg_waldump.
> > + */
> > +static void
> > +parse_required_wal(validator_context *context, char *pg_waldump_path,
> > +                   char *wal_directory, manifest_wal_range *first_wal_range)
> > +{
> > +    manifest_wal_range *this_wal_range = first_wal_range;
> > +
> > +    while (this_wal_range != NULL)
> > +    {
> > +        char *pg_waldump_cmd;
> > +
> > +        pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%X --end=%X/%X\n",
> > +               pg_waldump_path, wal_directory, this_wal_range->tli,
> > +               (uint32) (this_wal_range->start_lsn >> 32),
> > +               (uint32) this_wal_range->start_lsn,
> > +               (uint32) (this_wal_range->end_lsn >> 32),
> > +               (uint32) this_wal_range->end_lsn);
> > +        if (system(pg_waldump_cmd) != 0)
> > +            report_backup_error(context,
> > +                                "WAL parsing failed for timeline %u",
> > +                                this_wal_range->tli);
> > +
> > +        this_wal_range = this_wal_range->next;
> > +    }
> > +}
> 
> Should we have a function to properly escape paths in cases like this?
> Not that it's likely or really problematic, but the quoting for path
> could be "circumvented".

Are you looking for appendShellString(), or something different?



Re: backup manifests

From
Robert Haas
Date:
On Tue, Mar 31, 2020 at 6:50 PM Andres Freund <andres@anarazel.de> wrote:
> On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
> > I made an attempt to implement this.
>
> Awesome!

Here's a new patch set. I haven't fixed the things in your latest
round of review comments yet, but I did rewrite the documentation for
pg_validatebackup, add documentation for the new pg_waldump option,
and add regression tests for the new WAL-checking facility of
pg_validatebackup.

0001 - add pg_waldump -q
0002 - add checksum helpers
0003 - core backup manifest patch, now with WAL verification included

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-31 22:15:04 -0700, Noah Misch wrote:
> On Tue, Mar 31, 2020 at 03:50:34PM -0700, Andres Freund wrote:
> > On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
> > > +/*
> > > + * Attempt to parse the WAL files required to restore from backup using
> > > + * pg_waldump.
> > > + */
> > > +static void
> > > +parse_required_wal(validator_context *context, char *pg_waldump_path,
> > > +                   char *wal_directory, manifest_wal_range *first_wal_range)
> > > +{
> > > +    manifest_wal_range *this_wal_range = first_wal_range;
> > > +
> > > +    while (this_wal_range != NULL)
> > > +    {
> > > +        char *pg_waldump_cmd;
> > > +
> > > +        pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%X --end=%X/%X\n",
> > > +               pg_waldump_path, wal_directory, this_wal_range->tli,
> > > +               (uint32) (this_wal_range->start_lsn >> 32),
> > > +               (uint32) this_wal_range->start_lsn,
> > > +               (uint32) (this_wal_range->end_lsn >> 32),
> > > +               (uint32) this_wal_range->end_lsn);
> > > +        if (system(pg_waldump_cmd) != 0)
> > > +            report_backup_error(context,
> > > +                                "WAL parsing failed for timeline %u",
> > > +                                this_wal_range->tli);
> > > +
> > > +        this_wal_range = this_wal_range->next;
> > > +    }
> > > +}
> > 
> > Should we have a function to properly escape paths in cases like this?
> > Not that it's likely or really problematic, but the quoting for path
> > could be "circumvented".
> 
> Are you looking for appendShellString(), or something different?

Looks like that'd be it. Thanks.

Greetings,

Andres Freund



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-03-31 14:56:07 +0530, Amit Kapila wrote:
> On Tue, Mar 31, 2020 at 11:10 AM Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote:
> > > On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
> > > I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
> > > such. And eventually (definitely not this release) subsume pg_checksums
> > > in it. That way we can add other checkers too.
> >
> > Works for me; of those two, I prefer pg_validate.
> >
> 
> pg_validate sounds like a tool with a much bigger purpose.  I think
> even things like amcheck could also fall under it.

Intentionally so. We don't serve our users by collecting a lot of
differently named commands to work with data directories. A I wrote
above, the point would be to eventually have that tool also perform
checksum validation etc.  Potentially even in a single pass over the
data directory.


> This patch has two parts (a) Generate backup manifests for base
> backups, and (b) Validate backup (manifest).  It seems to me that
> there are not many things pending for (a), can't we commit that first
> or is it the case that (a) depends on (b)?  This is *not* a suggestion
> to leave pg_validatebackup from this release rather just to commit if
> something is ready and meaningful on its own.

IDK, it seems easier to be able to modify both at the same time.

Greetings,

Andres Freund



Re: backup manifests

From
David Steele
Date:
On 3/31/20 7:57 AM, Robert Haas wrote:
> On Mon, Mar 30, 2020 at 7:24 PM David Steele <david@pgmasters.net> wrote:
>>> I'm confused as to why you're not seeing that. What's the exact
>>> sequence of steps?
>>
>> $ pg_basebackup -D test/backup5 --manifest-checksums=SHA256
>>
>> $ vi test/backup5/backup_manifest
>>       * Add 'X' to the checksum of backup_label
>>
>> $ pg_validatebackup test/backup5
>> pg_validatebackup: fatal: invalid checksum for file "backup_label":
>> "a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fX"
>>
>> No mention of the manifest checksum being invalid.  But if I remove the
>> backup label file from the manifest:
>>
>> pg_validatebackup: fatal: manifest checksum mismatch
> 
> Oh, I see what's happening now. If the checksum is not an even-length
> string of hexademical characters, it's treated as a syntax error, so
> it bails out at that point. Generally, a syntax error in the manifest
> file is treated as a fatal error, and you just die right there. You'd
> get the same behavior if you had malformed JSON, like a stray { or }
> or [ or ] someplace that it doesn't belong according to the rules of
> JSON. On the other hand, if you corrupt the checksum by adding AA or
> EE or 54 or some other even-length string of hex characters, then you
> have (in this code's view) a semantic error rather than a syntax
> error, so it will finish loading all the manifest data and then bail
> because the checksum doesn't match.
> 
> We really can't avoid bailing out early sometimes, because if the file
> is totally malformed at the JSON level, there's just no way to
> continue. We could cause this particular error to get treated as a
> semantic error rather than a syntax error, but I don't really see much
> advantage in so doing. This way was easier to code, and I don't think
> it really matters which error we find first.

I think it would be good to know that the manifest checksum is bad in 
all cases because that may well inform other errors.

That said, I know you have a lot on your plate with this patch so I'm 
not going to make a fuss about such a minor gripe. Perhaps this can be 
considered for future improvement.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Robert Haas
Date:
On Wed, Apr 1, 2020 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Here's a new patch set. I haven't fixed the things in your latest
> round of review comments yet, but I did rewrite the documentation for
> pg_validatebackup, add documentation for the new pg_waldump option,
> and add regression tests for the new WAL-checking facility of
> pg_validatebackup.
>
> 0001 - add pg_waldump -q
> 0002 - add checksum helpers
> 0003 - core backup manifest patch, now with WAL verification included

And here's another new patch set. After some experimentation, I was
able to manually test the timeline-switch-during-a-base-backup case
and found that it had bugs in both pg_validatebackup and the code I
added to the backend's basebackup.c. So I fixed those. It would be
nice to have automated tests, but you need a large database (so that
backing it up takes non-trivial time) and a load on the primary (so
that WAL is being replayed during the backup) and there's a race
condition (because the backup has to not finish before the cascading
standby learns that the upstream has been promoted), so I don't at
present see a practical way to automate that. I did verify, in manual
testing, that a problem with WAL files on either timeline caused a
validation failure. I also verified that the LSNs at which the standby
began replay and reached consistency matched what was stored in the
manifest.

I also implemented Noah's suggestion that we should write the backup
manifest under a temporary name and then rename it afterward.
Stephen's original complaint that you could end up with a backup that
validates successfully even though we died before we got the WAL is,
at this point, moot, because pg_validatebackup is now capable of
noticing that the WAL is missing. Nevertheless, this seems like a nice
belt-and-suspenders check. I was able to position the rename *after*
we fsync() the backup directory, as well as after we get all of the
WAL, so unless those steps complete you'll have backup_manifest.tmp
rather than backup_manifest. It's true that, if we suffered an OS
crash before the fsync() completed and lost some files or some file
data, pg_validatebackup ought to fail anyway, but this way it is
absolutely certain to fail, and to do so immediately. Likewise for a
failure while fetching WAL that manages to leave the output directory
behind.

This version has also had a visit from the pgindent police.

I think this responds to pretty much all of the complaints that I know
about and upon which we have a reasonable degree of consensus. There
are still some things that not everybody is happy about. In
particular, Stephen and David are unhappy about using CRC-32C as the
default algorithm, but Andres and Noah both think it's a reasonable
choice, even if not as robust as everybody will want. As I agree, I'm
going to stick with that choice.

Also, there is still some debate about what the tool ought to be
called. My previous suggestion to rename this from pg_validatebackup
to pg_validatemanifest seems wrong now that WAL validation has been
added; in fact, given that we now have two independent sanity checks
on a backup, I'm going to argue that it would be reasonable to extend
that by adding more kinds of backup validation, perhaps even including
the permissions check that Andres suggested before. I don't plan to
pursue that at present, though. There remains the idea of merging this
with some other tool, but I still don't like that. On the one hand,
it's been suggested that it could be merged into pg_checksums, but I
think that is less appealing now that it seems to be growing into a
general-purpose backup validation tool. It may do things that have
nothing to do with checksums. On the other hand, it's been suggested
that it ought to be called pg_validate and that pg_checksums ought to
eventually be merged into it, but I don't think we have sufficient
consensus here to commit the project to such a plan. Nobody
responsible for the pg_checksums work has endorsed it, for example.
Moreover, pg_checksums does things other than validation, such as
enabling and disabling checksums. Therefore, I think it's unclear that
such a plan would achieve a sufficient degree of consensus.

For my part, I think this is a general issue that is not really this
patch's problem to solve. We have had multiple discussions over the
years about reducing the number of binaries that we ship. We could
have a general binary called "pg" or similar and use subcommands: pg
createdb, pg basebackup, pg validatebackup, etc. I think such an
approach is worth considering, though it would certainly be an
adjustment for everyone. Or we might do something else. But I don't
want to deal with that in this patch.

A couple of other minor suggestions have been made: (1) rejigger
things to avoid message duplication related to launching external
binaries, (2) maybe use appendShellString, and (3) change some details
of error-reporting related to manifest parsing. I don't believe anyone
views these as blockers; (1) and (2) are preexisting issues that this
patch extends to one new case.

Considering all the foregoing, I would like to go ahead and commit this stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-04-02 13:04:45 -0400, Robert Haas wrote:
> And here's another new patch set. After some experimentation, I was
> able to manually test the timeline-switch-during-a-base-backup case
> and found that it had bugs in both pg_validatebackup and the code I
> added to the backend's basebackup.c. So I fixed those.

Cool.


> It would be
> nice to have automated tests, but you need a large database (so that
> backing it up takes non-trivial time) and a load on the primary (so
> that WAL is being replayed during the backup) and there's a race
> condition (because the backup has to not finish before the cascading
> standby learns that the upstream has been promoted), so I don't at
> present see a practical way to automate that. I did verify, in manual
> testing, that a problem with WAL files on either timeline caused a
> validation failure. I also verified that the LSNs at which the standby
> began replay and reached consistency matched what was stored in the
> manifest.

I suspect its possible to control the timing by preventing the
checkpoint at the end of recovery from completing within a relevant
timeframe. I think configuring a large checkpoint_timeout and using a
non-fast base backup ought to do the trick. The state can be advanced by
separately triggering an immediate checkpoint? Or by changing the
checkpoint_timeout?



> I also implemented Noah's suggestion that we should write the backup
> manifest under a temporary name and then rename it afterward.
> Stephen's original complaint that you could end up with a backup that
> validates successfully even though we died before we got the WAL is,
> at this point, moot, because pg_validatebackup is now capable of
> noticing that the WAL is missing. Nevertheless, this seems like a nice
> belt-and-suspenders check.

Yea, it's imo generally a good idea.


> I think this responds to pretty much all of the complaints that I know
> about and upon which we have a reasonable degree of consensus. There
> are still some things that not everybody is happy about. In
> particular, Stephen and David are unhappy about using CRC-32C as the
> default algorithm, but Andres and Noah both think it's a reasonable
> choice, even if not as robust as everybody will want. As I agree, I'm
> going to stick with that choice.

I think it might be worth looking, in a later release, at something like
blake3 for a fast cryptographic checksum. By allowing for instruction
parallelism (by independently checksuming different blocks in data, and
only advancing the "shared" checksum separately) it achieves
considerably higher throughput rates.

I suspect we should also look at a better non-crypto hash. xxhash or
whatever. Not just for these checksums, but also for in-memory.


> Also, there is still some debate about what the tool ought to be
> called. My previous suggestion to rename this from pg_validatebackup
> to pg_validatemanifest seems wrong now that WAL validation has been
> added; in fact, given that we now have two independent sanity checks
> on a backup, I'm going to argue that it would be reasonable to extend
> that by adding more kinds of backup validation, perhaps even including
> the permissions check that Andres suggested before.

FWIW, the only check I'd really like to see in this release is the
crosscheck with the files length and the actually read data (to be able
to disagnose FS issues).


Greetings,

Andres Freund



Re: backup manifests

From
Robert Haas
Date:
On Thu, Apr 2, 2020 at 1:23 PM Andres Freund <andres@anarazel.de> wrote:
> I suspect its possible to control the timing by preventing the
> checkpoint at the end of recovery from completing within a relevant
> timeframe. I think configuring a large checkpoint_timeout and using a
> non-fast base backup ought to do the trick. The state can be advanced by
> separately triggering an immediate checkpoint? Or by changing the
> checkpoint_timeout?

That might make the window fairly wide on normal systems, but I'm not
sure about Raspberry Pi BF members or things running
CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.

> I think it might be worth looking, in a later release, at something like
> blake3 for a fast cryptographic checksum. By allowing for instruction
> parallelism (by independently checksuming different blocks in data, and
> only advancing the "shared" checksum separately) it achieves
> considerably higher throughput rates.
>
> I suspect we should also look at a better non-crypto hash. xxhash or
> whatever. Not just for these checksums, but also for in-memory.

I have no problem with that. I don't feel that I am well-placed to
recommend for or against specific algorithms. Speed is easy to
measure, but there's also code stability, the license under which
something is released, the quality of the hashes it produces, and the
extent to which it is cryptographically secure. I'm not an expert in
any of that stuff, but if we get consensus on something it should be
easy enough to plug it into this framework. Even changing the default
would be no big deal.

> FWIW, the only check I'd really like to see in this release is the
> crosscheck with the files length and the actually read data (to be able
> to disagnose FS issues).

Not sure I understand this comment. Isn't that a subset of what the
patch already does? Are you asking for something to be changed?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-04-02 14:16:27 -0400, Robert Haas wrote:
> On Thu, Apr 2, 2020 at 1:23 PM Andres Freund <andres@anarazel.de> wrote:
> > I suspect its possible to control the timing by preventing the
> > checkpoint at the end of recovery from completing within a relevant
> > timeframe. I think configuring a large checkpoint_timeout and using a
> > non-fast base backup ought to do the trick. The state can be advanced by
> > separately triggering an immediate checkpoint? Or by changing the
> > checkpoint_timeout?
> 
> That might make the window fairly wide on normal systems, but I'm not
> sure about Raspberry Pi BF members or things running
> CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.

You can set checkpoint_timeout to be a day. If that's not enough, well,
then I think we have other problems.


> > FWIW, the only check I'd really like to see in this release is the
> > crosscheck with the files length and the actually read data (to be able
> > to disagnose FS issues).
> 
> Not sure I understand this comment. Isn't that a subset of what the
> patch already does? Are you asking for something to be changed?

Yes, I am asking for something to be changed: I'd like the code that
read()s the file when computing the checksum to add up how many bytes
were read, and compare that to the size in the manifest. And if there's
a difference report an error about that, instead of a checksum failure.

I've repeatedly seen filesystem issues lead to to earlier EOFs when
read()ing than what stat() returns. It'll be pretty annoying to have to
debug a general "checksum failure", rather than just knowing that
reading stopped after 100MB of 1GB.

Greetings,

Andres Freund



Re: backup manifests

From
Robert Haas
Date:
On Thu, Apr 2, 2020 at 2:23 PM Andres Freund <andres@anarazel.de> wrote:
> > That might make the window fairly wide on normal systems, but I'm not
> > sure about Raspberry Pi BF members or things running
> > CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.
>
> You can set checkpoint_timeout to be a day. If that's not enough, well,
> then I think we have other problems.

I'm not sure that's the only issue here, but I'll try it.

> Yes, I am asking for something to be changed: I'd like the code that
> read()s the file when computing the checksum to add up how many bytes
> were read, and compare that to the size in the manifest. And if there's
> a difference report an error about that, instead of a checksum failure.
>
> I've repeatedly seen filesystem issues lead to to earlier EOFs when
> read()ing than what stat() returns. It'll be pretty annoying to have to
> debug a general "checksum failure", rather than just knowing that
> reading stopped after 100MB of 1GB.

Is 0004 attached like what you have in mind?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Andres Freund
Date:
On 2020-04-02 14:55:19 -0400, Robert Haas wrote:
> > Yes, I am asking for something to be changed: I'd like the code that
> > read()s the file when computing the checksum to add up how many bytes
> > were read, and compare that to the size in the manifest. And if there's
> > a difference report an error about that, instead of a checksum failure.
> >
> > I've repeatedly seen filesystem issues lead to to earlier EOFs when
> > read()ing than what stat() returns. It'll be pretty annoying to have to
> > debug a general "checksum failure", rather than just knowing that
> > reading stopped after 100MB of 1GB.
> 
> Is 0004 attached like what you have in mind?

Yes. Thanks!

- Andres



Re: backup manifests

From
David Steele
Date:
On 4/2/20 1:04 PM, Robert Haas wrote:
 >
> There
> are still some things that not everybody is happy about. In
> particular, Stephen and David are unhappy about using CRC-32C as the
> default algorithm, but Andres and Noah both think it's a reasonable
> choice, even if not as robust as everybody will want. As I agree, I'm
> going to stick with that choice.

Yeah, I seem to be on the losing side of this argument, at least for 
now, so I don't think it should block the commit of this patch. It's an 
easy enough tweak if we change our minds.

> For my part, I think this is a general issue that is not really this
> patch's problem to solve. We have had multiple discussions over the
> years about reducing the number of binaries that we ship. We could
> have a general binary called "pg" or similar and use subcommands: pg
> createdb, pg basebackup, pg validatebackup, etc. I think such an
> approach is worth considering, though it would certainly be an
> adjustment for everyone. Or we might do something else. But I don't
> want to deal with that in this patch.

I'm fine with the current name, especially now that WAL is validated.

> A couple of other minor suggestions have been made: (1) rejigger
> things to avoid message duplication related to launching external
> binaries, 

That'd be nice to have, but I think we can live without it for now.

> (2) maybe use appendShellString

Seems like this would be good to have but I'm not going to make a fuss 
about it.

> and (3) change some details
> of error-reporting related to manifest parsing. I don't believe anyone
> views these as blockers

I'd view this as later refinement once we see how the tool is being used 
and/or get gripes from the field.

So, with the addition of the 0004 patch down-thread this looks 
committable to me.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Robert Haas
Date:
On Thu, Apr 2, 2020 at 2:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Apr 2, 2020 at 2:23 PM Andres Freund <andres@anarazel.de> wrote:
> > > That might make the window fairly wide on normal systems, but I'm not
> > > sure about Raspberry Pi BF members or things running
> > > CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.
> >
> > You can set checkpoint_timeout to be a day. If that's not enough, well,
> > then I think we have other problems.
>
> I'm not sure that's the only issue here, but I'll try it.

I ran into a few problems here. In trying to set this up manually, I
always began with the following steps:

====
# (1) create cluster
initdb

# (2) add to configuration file
log_checkpoints=on
checkpoint_timeout=1d
checkpoint_completion_target=0.99

# (3) fire it up
postgres
createdb
====

If at this point I do "pg_basebackup -D pgslave -R -c spread", it
completes within a few seconds anyway, because there's basically
nothing dirty, and no matter how slowly you write out no data, it's
still pretty quick. If I run "pgbench -i" first, and then
"pg_basebackup -D pgslave -R -c spread", it hangs, apparently
essentially forever, because now the checkpoint has something to do,
and it does it super-slowly, and "psql -c checkpoint" makes it finish
immediately. However, this experiment isn't testing quite the right
thing, because what I actually need is a slow backup off of a
cascading standby, so that I have time to promote the parent standby
before the backup completes. I tried continuing like this:

====
# (4) set up standby
pg_basebackup -D pgslave -R
postgres -D pgslave -c port=5433

# (5) set up cascading standby
pg_basebackup -D pgslave2 -d port=5433 -R
postgres -c port=5434 -D pgslave2

# (6) dirty some pages on the master
pgbench -i

# (7) start a backup of the cascading standby
pg_basebackup -D pgslave3 -d port=5434 -R -c spread
====

However, the pg_basebackup in the last step completes after only a few
seconds. If it were hanging, then I could continue with "pg_ctl
promote -D pgslave" and that might give me what I need, but that's not
what happens.

I suspect I'm not doing quite what you had in mind here... thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Thu, Apr 2, 2020 at 3:26 PM David Steele <david@pgmasters.net> wrote:
> So, with the addition of the 0004 patch down-thread this looks
> committable to me.

Glad to hear it. Thank you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Andres Freund
Date:
On 2020-04-02 15:42:48 -0400, Robert Haas wrote:
> I suspect I'm not doing quite what you had in mind here... thoughts?

I have some ideas, but I think it's complicated enough that I'd not put
it in the "pre commit path" for now.



Re: backup manifests

From
David Steele
Date:
On 4/2/20 3:47 PM, Andres Freund wrote:
> On 2020-04-02 15:42:48 -0400, Robert Haas wrote:
>> I suspect I'm not doing quite what you had in mind here... thoughts?
> 
> I have some ideas, but I think it's complicated enough that I'd not put
> it in the "pre commit path" for now.

+1. These would be great tests to have and a win for pg_basebackup 
overall but I don't think they should be a prerequisite for this commit.

Regards,
-- 
-David
david@pgmasters.net



Re: backup manifests

From
Robert Haas
Date:
On Thu, Apr 2, 2020 at 4:34 PM David Steele <david@pgmasters.net> wrote:
> +1. These would be great tests to have and a win for pg_basebackup
> overall but I don't think they should be a prerequisite for this commit.

Not to mention the server. I can't say that I have a lot of confidence
that all of the server behavior in this area is well-understood and
sane.

I've pushed all the patches. Hopefully everyone is happy now, or at
least not so unhappy that they're going to break quarantine to beat me
up. I hope I acknowledged all of the relevant people in the commit
message, but it's possible that I missed somebody; if so, my
apologies. As is my usual custom, I added entries in roughly the order
that people chimed in on the thread, so the ordering should not be
taken as a reflection of magnitude of contribution or, well, anything
other than the approximate order in which they chimed in.

It looks like the buildfarm is unhappy though, so I guess I'd better
go look at that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 3:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
> It looks like the buildfarm is unhappy though, so I guess I'd better
> go look at that.

I fixed two things so far, and there seems to be at least one more
possible issue that I don't understand.

1. Apparently, we have an automated perlcritic run built in to the
build farm, and apparently, it really hates Perl subroutines that
don't end with an explicit return statement. We have that overridden
to severity 5 in our Perl critic configuration. I guess I should've
known this, but didn't. I've pushed a fix adding return statements. I
believe I'm on record as thinking that perlcritic is a tool for
complaining about a lot of things that don't really matter and very
few that actually do -- but it's project style, so I'll suck it up!

2. Also, a bunch of machines were super-unhappy with
003_corruption.pl, failing with this sort of thing:

pg_basebackup: error: could not get COPY data stream: ERROR:  symbolic
link target too long for tar format: file name "pg_tblspc/16387",
target "/home/fabien/pg/build-farm-11/buildroot/HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/tmp_test_7w0w"

Apparently, this is a known problem and the solution is to use
TestLib::tempdir_short instead of TestLib::tempdir, so I pushed a fix
to make it do that.

3. spurfowl has failed its last two runs like this:

sh: 1: ./configure: not found

I am not sure how this patch could've caused that to happen, but the
timing of the failures is certainly suspicious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
> 2. Also, a bunch of machines were super-unhappy with
> 003_corruption.pl, failing with this sort of thing:
>
> pg_basebackup: error: could not get COPY data stream: ERROR:  symbolic
> link target too long for tar format: file name "pg_tblspc/16387",
> target "/home/fabien/pg/build-farm-11/buildroot/HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/tmp_test_7w0w"
>
> Apparently, this is a known problem and the solution is to use
> TestLib::tempdir_short instead of TestLib::tempdir, so I pushed a fix
> to make it do that.

By and large, the buildfarm is a lot happier now, but fairywren
(Windows / Msys Server 2019 / 2 gcc 7.3.0 x86_64) failed like this:

# Postmaster PID for node "master" is 198420
error running SQL: 'psql:<stdin>:3: ERROR:  directory
"/tmp/9peoZHrEia" does not exist'
while running 'psql -XAtq -d port=51493 host=127.0.0.1
dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'CREATE TABLE x1
(a int);
INSERT INTO x1 VALUES (111);
CREATE TABLESPACE ts1 LOCATION '/tmp/9peoZHrEia';
CREATE TABLE x2 (a int) TABLESPACE ts1;
INSERT INTO x1 VALUES (222);
' at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
line 1531.
### Stopping node "master" using mode immediate

I wondered why this should be failing on this machine when none of the
other places where tempdir_short is used are similarly failing. The
answer appears to be that most of the TAP tests that use tempdir_short
just do this:

my $tempdir_short = TestLib::tempdir_short;

...and then ignore that variable completely for the rest of the
script.  That's not ideal, and we should probably remove those calls
to avoid giving that it's actually used for something. The two TAP
tests that actually do something with it - apart from the one I just
added - are pg_basebackup's 010_pg_basebackup.pl and pg_ctl's
001_start_stop.pl. However, both of those are skipped on Windows.
Also, PostgresNode.pm itself uses it, but only when UNIX sockets are
used, so again not on Windows. So it sorta looks to me like we no
preexisting tests that meaningfully exercise TestLib::tempdir_short on
Windows.

Given that, I suppose I should consider myself lucky if this ends up
working on *any* of the Windows critters, but given the implementation
I'm kinda surprised we have a problem. That function is just:

sub tempdir_short
{

        return File::Temp::tempdir(CLEANUP => 1);
}

And File::Temp's documentation says that the temporary directory is
picked using File::Spec's tmpdir(), which says that it knows about
different operating systems and will DTRT on Unix, Mac, OS2, Win32,
and VMS. Yet on fairywren it is apparently DTWT. I'm not sure why.

Any ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Alvaro Herrera
Date:
On 2020-Apr-03, Robert Haas wrote:

> sub tempdir_short
> {
> 
>         return File::Temp::tempdir(CLEANUP => 1);
> }
> 
> And File::Temp's documentation says that the temporary directory is
> picked using File::Spec's tmpdir(), which says that it knows about
> different operating systems and will DTRT on Unix, Mac, OS2, Win32,
> and VMS. Yet on fairywren it is apparently DTWT. I'm not sure why.

Maybe it needs perl2host?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backup manifests

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 4:54 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Maybe it needs perl2host?

*jaw drops*

Wow, OK, yeah, that looks like the thing.  Thanks for the suggestion;
I didn't know that existed (and I kinda wish I still didn't).

I'lll go see about adding that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Justin Pryzby
Date:
On Fri, Apr 03, 2020 at 03:22:23PM -0400, Robert Haas wrote:
> I've pushed all the patches.

I didn't manage to look at this in advance but have some doc fixes.

word-diff:

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 536de9a698..d84afb7b18 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2586,7 +2586,7 @@ The commands accepted in replication mode are:
          and sent along with the backup.  The manifest is a list of every
          file present in the backup with the exception of any WAL files that
          may be included. It also stores the size, last modification time, and
          [-an optional-]{+optionally a+} checksum for each file.
          A value of <literal>force-escape</literal> forces all filenames
          to be hex-encoded; otherwise, this type of encoding is performed only
          for files whose names are non-UTF8 octet sequences.
@@ -2602,7 +2602,7 @@ The commands accepted in replication mode are:
        <term><literal>MANIFEST_CHECKSUMS</literal></term>
        <listitem>
         <para>
          Specifies the {+checksum+} algorithm that should be applied to each file included
          in the backup manifest. Currently, the available
          algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
          <literal>SHA224</literal>, <literal>SHA256</literal>,
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c778e061f3..922688e227 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -604,7 +604,7 @@ PostgreSQL documentation
        not contain any checksums. Otherwise, it will contain a checksum
        of each file in the backup using the specified algorithm. In addition,
        the manifest will always contain a <literal>SHA256</literal>
        checksum of its own [-contents.-]{+content.+} The <literal>SHA</literal> algorithms
        are significantly more CPU-intensive than <literal>CRC32C</literal>,
        so selecting one of them may increase the time required to complete
        the backup.
@@ -614,7 +614,7 @@ PostgreSQL documentation
        of each file for users who wish to verify that the backup has not been
        tampered with, while the CRC32C algorithm provides a checksum which is
        much faster to calculate and good at catching errors due to accidental
        changes but is not resistant to [-targeted-]{+malicious+} modifications.  Note that, to
        be useful against an adversary who has access to the backup, the backup
        manifest would need to be stored securely elsewhere or otherwise
        verified not to have been modified since the backup was taken.
diff --git a/doc/src/sgml/ref/pg_validatebackup.sgml b/doc/src/sgml/ref/pg_validatebackup.sgml
index 19888dc196..748ac439a6 100644
--- a/doc/src/sgml/ref/pg_validatebackup.sgml
+++ b/doc/src/sgml/ref/pg_validatebackup.sgml
@@ -41,12 +41,12 @@ PostgreSQL documentation
  </para>

  <para>
   It is important to note that[-that-] the validation which is performed by
   <application>pg_validatebackup</application> does not and [-can not-]{+cannot+} include
   every check which will be performed by a running server when attempting
   to make use of the backup. Even if you use this tool, you should still
   perform test restores and verify that the resulting databases work as
   expected and that they[-appear to-] contain the correct data. However,
   <application>pg_validatebackup</application> can detect many problems
   that commonly occur due to storage problems or user error.
  </para>
@@ -73,7 +73,7 @@ PostgreSQL documentation
   a <literal>backup_manifest</literal> file in the target directory or
   about anything inside <literal>pg_wal</literal>, even though these
   files won't be listed in the backup manifest. Only files are checked;
   the presence or absence [-or-]{+of+} directories is not verified, except
   indirectly: if a directory is missing, any files it should have contained
   will necessarily also be missing. 
  </para>
@@ -84,7 +84,7 @@ PostgreSQL documentation
   for any files for which the computed checksum does not match the
   checksum stored in the manifest. This step is not performed for any files
   which produced errors in the previous step, since they are already known
   to have problems. [-Also, files-]{+Files+} which were ignored in the previous step are
   also ignored in this step.
  </para>

@@ -123,7 +123,7 @@ PostgreSQL documentation
  <title>Options</title>

   <para>
    The following command-line options control the [-behavior.-]{+behavior of this program.+}

    <variablelist>
     <varlistentry>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 3b18e733cd..aa72a6ff10 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1148,7 +1148,7 @@ AddFileToManifest(manifest_info *manifest, const char *spcoid,
    }

    /*
     * Each file's entry [-need-]{+needs+} to be separated from any entry that follows by a
     * comma, but there's no comma before the first one or after the last one.
     * To make that work, adding a file to the manifest starts by terminating
     * the most recently added line, with a comma if appropriate, but does not

-- 
Justin

Attachment

backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
[ splitting this off into a separate thread ]

On Fri, Apr 3, 2020 at 5:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I'lll go see about adding that.

Done now. Meanwhile, two more machines have reported the mysterious message:

sh: ./configure: not found

...that first appeared on spurfowl a few hours ago. The other two
machines are eelpout and elver, both of which list Thomas Munro as a
maintainer. spurfowl lists Stephen Frost. Thomas, Stephen, can one of
you check and see what's going on? spurfowl has failed this way four
times now, and eelpout and elver have each failed the last two runs,
but since there's no helpful information in the logs, it's hard to
guess what went wrong.

I'm sort of afraid that something in the new TAP tests accidentally
removed way too many files during the cleanup phase - e.g. it decided
the temporary directory was / and removed every file it could access,
or something like that. It doesn't do that here, or I, uh, would've
noticed by now. But sometimes strange things happen on other people's
machines. Hopefully one of those strange things is not that my test
code is single-handedly destroying the entire buildfarm, but it's
possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Fabien COELHO
Date:
Hello Robert,

> Done now. Meanwhile, two more machines have reported the mysterious message:
>
> sh: ./configure: not found
>
> ...that first appeared on spurfowl a few hours ago. The other two
> machines are eelpout and elver, both of which list Thomas Munro as a
> maintainer. spurfowl lists Stephen Frost. Thomas, Stephen, can one of
> you check and see what's going on? spurfowl has failed this way four
> times now, and eelpout and elver have each failed the last two runs,
> but since there's no helpful information in the logs, it's hard to
> guess what went wrong.
>
> I'm sort of afraid that something in the new TAP tests accidentally
> removed way too many files during the cleanup phase - e.g. it decided
> the temporary directory was / and removed every file it could access,
> or something like that. It doesn't do that here, or I, uh, would've
> noticed by now. But sometimes strange things happen on other people's
> machines. Hopefully one of those strange things is not that my test
> code is single-handedly destroying the entire buildfarm, but it's
> possible.

seawasp just failed the same way. Good news, I can see "configure" under 
"HEAD/pgsql".

The only strange thing under buildroot I found is:


HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/

this last directory perms are d--------- which seems to break cleanup.

It may be a left over from a previous run which failed (possibly 21dc488 
?). I cannot see how this would be related to configure, though. Maybe 
something else fails silently and the message is about a consequence of 
the prior silent failure.

I commented out the cron job and will try to look into it on tomorrow if 
the status has not changed by then.

-- 
Fabien.



Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> The only strange thing under buildroot I found is:

>
HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/

> this last directory perms are d--------- which seems to break cleanup.

Locally, I observe that "make clean" in src/bin/pg_validatebackup fails
to clean up the tmp_check directory left behind by "make check".
So the new makefile is not fully plugged into its standard
responsibilities.  I don't see any unreadable subdirectories though.

I wonder if VPATH versus not-VPATH might be a relevant factor ...

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Alvaro Herrera
Date:
On 2020-Apr-03, Tom Lane wrote:

> I wonder if VPATH versus not-VPATH might be a relevant factor ...

Oh, absolutely.  The ones that failed show, in the last successful run,
the configure line invoked as "./configure", while the animals that are
still running are invoking configure from some other directory.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backup manifests and contemporaneous buildfarm failures

From
Thomas Munro
Date:
On Sat, Apr 4, 2020 at 11:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fabien COELHO <coelho@cri.ensmp.fr> writes:
> > The only strange thing under buildroot I found is:
>
> >
HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
>
> > this last directory perms are d--------- which seems to break cleanup.

Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
presumably by the perl subroutine mutilate_open_directory_fails.  I
see this in my inbox (the build farm wrote it to stderr or stdout
rather than the log file):

cannot chdir to child for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
Permission denied at ./run_build.pl line 1013.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
Directory not empty at ./run_build.pl line 1013.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup:
Directory not empty at ./run_build.pl line 1013.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data:
Directory not empty at ./run_build.pl line 1013.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check: Directory not empty
at ./run_build.pl line 1013.
cannot remove directory for pgsql.build/src/bin/pg_validatebackup:
Directory not empty at ./run_build.pl line 1013.
cannot remove directory for pgsql.build/src/bin: Directory not empty
at ./run_build.pl line 1013.
cannot remove directory for pgsql.build/src: Directory not empty at
./run_build.pl line 1013.
cannot remove directory for pgsql.build: Directory not empty at
./run_build.pl line 1013.
cannot chdir to child for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
Permission denied at ./run_build.pl line 589.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
Directory not empty at ./run_build.pl line 589.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup:
Directory not empty at ./run_build.pl line 589.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data:
Directory not empty at ./run_build.pl line 589.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check: Directory not empty
at ./run_build.pl line 589.
cannot remove directory for pgsql.build/src/bin/pg_validatebackup:
Directory not empty at ./run_build.pl line 589.
cannot remove directory for pgsql.build/src/bin: Directory not empty
at ./run_build.pl line 589.
cannot remove directory for pgsql.build/src: Directory not empty at
./run_build.pl line 589.
cannot remove directory for pgsql.build: Directory not empty at
./run_build.pl line 589.



Re: backup manifests and contemporaneous buildfarm failures

From
Stephen Frost
Date:
Greetings,

* Thomas Munro (thomas.munro@gmail.com) wrote:
> On Sat, Apr 4, 2020 at 11:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Fabien COELHO <coelho@cri.ensmp.fr> writes:
> > > The only strange thing under buildroot I found is:
> >
> > >
HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
> >
> > > this last directory perms are d--------- which seems to break cleanup.
>
> Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
> presumably by the perl subroutine mutilate_open_directory_fails.  I
> see this in my inbox (the build farm wrote it to stderr or stdout
> rather than the log file):

Yup, saw the same here.

chmod'ing it to 755 seemed to result it the next run cleaning it up, at
least.  Not sure how things will go on the next actual build tho.

Thanks,

Stephen

Attachment

Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
> presumably by the perl subroutine mutilate_open_directory_fails.  I
> see this in my inbox (the build farm wrote it to stderr or stdout
> rather than the log file):

> cannot chdir to child for
> pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
> Permission denied at ./run_build.pl line 1013.
> cannot remove directory for
> pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
> Directory not empty at ./run_build.pl line 1013.

I'm guessing that we're looking at a platform-specific difference in
whether "rm -rf" fails outright on an unreadable subdirectory, or
just tries to carry on by unlinking it anyway.

A partial fix would be to have the test script put back normal
permissions on that directory before it exits ... but any failure
partway through the script would leave a time bomb requiring manual
cleanup.

On the whole, I'd argue that testing that behavior is not valuable
enough to take risks of periodically breaking buildfarm members
in a way that will require manual recovery --- to say nothing of
annoying developers who trip over it.  So my vote is to remove
that part of the test and be satisfied with checking the behavior
for an unreadable file.

This doesn't directly explain the failure-at-next-configure behavior
that we're seeing in the buildfarm, but it wouldn't be too surprising
if it ends up being that the buildfarm client script doesn't manage
to fully recover from the situation.

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
I wrote:
> I'm guessing that we're looking at a platform-specific difference in
> whether "rm -rf" fails outright on an unreadable subdirectory, or
> just tries to carry on by unlinking it anyway.

Yeah... on my RHEL6 box, "make check" cleans up the working directories
under tmp_check, but on a FreeBSD 12.1 box, not so much: I'm left with

$ ls tmp_check/
log/                            t_003_corruption_master_data/
tgl@oldmini$ ls -R tmp_check/t_003_corruption_master_data/
backup/

tmp_check/t_003_corruption_master_data/backup:
open_directory_fails/

tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
pg_subtrans/

tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
ls: tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans: Permission denied

I did not see any complaints printed to the terminal, but in
regress_log_003_corruption there's

...
ok 40 - corrupt backup fails validation: open_directory_fails: matches
cannot chdir to child for
/usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
Permissiondenied at t/003_corruption.pl line 126. 
cannot remove directory for
/usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
Directorynot empty at t/003_corruption.pl line 126. 
# Running: pg_basebackup -D
/usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/search_directory_fails
--no-sync-T /tmp/lxaL_sLcnr=/tmp/_fegwVjoDR 
ok 41 - base backup ok
...

This may be more of a Perl version issue than a platform issue,
but either way it's a problem.

Also, on the FreeBSD box, "rm -rf" isn't happy either:

$ rm -rf tmp_check
rm: tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans: Permission denied
rm: tmp_check/t_003_corruption_master_data/backup/open_directory_fails: Directory not empty
rm: tmp_check/t_003_corruption_master_data/backup: Directory not empty
rm: tmp_check/t_003_corruption_master_data: Directory not empty
rm: tmp_check: Directory not empty


            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 6:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm guessing that we're looking at a platform-specific difference in
> whether "rm -rf" fails outright on an unreadable subdirectory, or
> just tries to carry on by unlinking it anyway.

My intention was that it would be cleaned by the TAP framework itself,
since the temporary directories it creates are marked for cleanup. But
it may be that there's a platform dependency in the behavior of Perl's
File::Path::rmtree, too.

> A partial fix would be to have the test script put back normal
> permissions on that directory before it exits ... but any failure
> partway through the script would leave a time bomb requiring manual
> cleanup.

Yeah. I've pushed that fix for now, but as you say, it may not survive
contact with the enemy. That's kind of disappointing, because I put a
lot of work into trying to make the tests cover every line of code
that they possibly could, and there's no reason to suppose that
pg_validatebackup is the only tool that could benefit from having code
coverage of those kinds of scenarios. It's probably not even the tool
that is most in need of such testing; it must be far worse if, say,
pg_rewind can't cope with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 5:58 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> seawasp just failed the same way. Good news, I can see "configure" under
> "HEAD/pgsql".

Ah, good.

> The only strange thing under buildroot I found is:
>
>
HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/

Huh. I wonder how that got left behind ... it should've been cleaned
up by the TAP test framework. But I pushed a commit to change the
permissions back explicitly before exiting. As Tom says, I probably
need to remove that entire test, but I'm going to try this first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 3, 2020 at 6:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm guessing that we're looking at a platform-specific difference in
>> whether "rm -rf" fails outright on an unreadable subdirectory, or
>> just tries to carry on by unlinking it anyway.

> My intention was that it would be cleaned by the TAP framework itself,
> since the temporary directories it creates are marked for cleanup. But
> it may be that there's a platform dependency in the behavior of Perl's
> File::Path::rmtree, too.

Yeah, so it would seem.  The buildfarm script uses rmtree to clean out
the old build tree.  The man page for File::Path suggests (but can't
quite bring itself to say in so many words) that by default, rmtree
will adjust the permissions on target directories to allow the deletion
to succeed.  But that's very clearly not happening on some platforms.
(Maybe that represents a local patch on the part of some packagers
who thought it was too unsafe?)

Anyway, the end state presumably is that the pgsql.build directory
is still there at the end of the buildfarm run, and the next run's
attempt to also rmtree it fares no better.  Then look what it does
to set up the new build:

        system("cp -R -p $target $build_path 2>&1");

Of course, if $build_path already exists, then cp copies to a subdirectory
of the target not the target itself.  So that explains the symptom
"./configure does not exist" --- it exists all right, but in a
subdirectory below the one where the buildfarm expects it to be.

It looks to me like the same problem would occur with VPATH or no.
The lack of failures among the VPATH-using critters probably has
more to do with whether their rmtree is willing to deal with this
case than with VPATH.

Anyway, it's evident that the buildfarm critters that are busted
will need manual cleanup, because the script is not going to be
able to get out of this by itself.  I remain of the opinion that
the hazard of that happening again in the future (eg, if a buildfarm
animal loses power during the test) is sufficient reason to remove
this test case.

            regards, tom lane



Re: backup manifests

From
Tom Lane
Date:
BTW, some of the buildfarm is showing a simpler portability problem:
they think you were too cavalier about the difference between time_t
and pg_time_t.  (On a platform with 32-bit time_t, that's an actual
bug, probably.)  lapwing is actually failing:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2020-04-03%2021%3A41%3A49

ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla
-Wendif-labels-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g
-O2-Werror -I. -I. -I../../../src/include  -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE
-I/usr/include/libxml2 -I/usr/include/et  -c -o basebackup.o basebackup.c 
basebackup.c: In function 'AddFileToManifest':
basebackup.c:1199:10: error: passing argument 1 of 'pg_gmtime' from incompatible pointer type [-Werror]
In file included from ../../../src/include/access/xlog_internal.h:26:0,
                 from basebackup.c:20:
../../../src/include/pgtime.h:49:22: note: expected 'const pg_time_t *' but argument is of type 'time_t *'
cc1: all warnings being treated as errors
make[3]: *** [basebackup.o] Error 1

but some others are showing it as a warning.

I suppose that judicious s/time_t/pg_time_t/ would fix this.

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 6:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Locally, I observe that "make clean" in src/bin/pg_validatebackup fails
> to clean up the tmp_check directory left behind by "make check".

Fixed.

I also tried to fix 'lapwing', which was complaining about about a
call to pg_gmtime, saying that it "expected 'const pg_time_t *' but
argument is of type 'time_t *'". I was thinking that the problem had
something to do with const, but Thomas pointed out to me that
pg_time_t != time_t, so I pushed a fix which assumes that was the
issue. (It was certainly *an* issue.)

'prairiedog' is also unhappy, and it looks related:

/bin/sh ../../../../config/install-sh -c -d
'/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts'/tmp_check
cd . && TESTDIR='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts'
PATH="/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/bin:$PATH"

DYLD_LIBRARY_PATH="/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/lib:$DYLD_LIBRARY_PATH"
 PGPORT='65678'

PG_REGRESS='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts/../../../../src/test/regress/pg_regress'
REGRESS_SHLIB='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/regress/regress.so'
/usr/local/perl5.8.3/bin/prove -I ../../../../src/test/perl/ -I .
t/*.pl
t/001_base.........ok
t/002_standby......FAILED--Further testing stopped: system pg_basebackup failed
make: *** [check] Error 25

Unfortunately, that error message is not very informative and for some
reason the TAP logs don't seem to be included in the buildfarm output
in this case, so it's hard to tell exactly what went wrong. This
appears to be another 32-bit critter, which may be related somehow,
but I don't know how exactly.

'serinus' is also failing. This is less obviously related:

[02:08:55] t/003_constraints.pl .. ok     2048 ms ( 0.01 usr  0.00 sys
+  1.28 cusr  0.38 csys =  1.67 CPU)
# poll_query_until timed out executing this query:
# SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN
('r', 's');
# expecting this output:
# t
# last actual query output:
# f
# with stderr:

But there's also this:

2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection
received: host=[local]
2020-04-04 02:08:57.298 CEST [5e87d019.506c1:2] LOG:  replication
connection authorized: user=bf
application_name=tap_sub_16390_sync_16384
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:3] LOG:  statement: BEGIN
READ ONLY ISOLATION LEVEL REPEATABLE READ
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
slot "tap_sub_16390_sync_16384" already exists
TRAP: FailedAssertion("owner->bufferarr.nitems == 0", File:
"/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/../pgsql/src/backend/utils/resowner/resowner.c",
Line: 718)
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(ExceptionalCondition+0x5c)[0x9a13ac]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(ResourceOwnerDelete+0x295)[0x9db8e5]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)[0x54c61f]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(AbortOutOfAnyTransaction+0x122)[0x550e32]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)[0x9b3bc9]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(shmem_exit+0x35)[0x80db45]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)[0x80dc77]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(proc_exit+0x8)[0x80dd08]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(PostgresMain+0x59f)[0x83bd0f]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)[0x7a0264]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(PostmasterMain+0xbfc)[0x7a2b8c]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(main+0x6fb)[0x49749b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fc52d83bbbb]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(_start+0x2a)[0x49753a]
2020-04-04 02:08:57.302 CEST [5e87d018.5066b:4] LOG:  server process
(PID 329409) was terminated by signal 6: Aborted
2020-04-04 02:08:57.302 CEST [5e87d018.5066b:5] DETAIL:  Failed
process was running: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ

That might well be related. I note in passing that the DETAIL emitted
by the postmaster shows the previous SQL command rather than the
more-recent replication command, which seems like something to fix. (I
still really dislike the fact that we have this evil hack allowing one
connection to mix and match those sets of commands...)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 8:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, so it would seem.  The buildfarm script uses rmtree to clean out
> the old build tree.  The man page for File::Path suggests (but can't
> quite bring itself to say in so many words) that by default, rmtree
> will adjust the permissions on target directories to allow the deletion
> to succeed.  But that's very clearly not happening on some platforms.
> (Maybe that represents a local patch on the part of some packagers
> who thought it was too unsafe?)

Interestingly, on my machine, rmtree coped with a mode 0 directory
just fine, but mode 0400 was more than its tiny brain could handle, so
the originally committed fix had code to revert 0400 back to 0700, but
I didn't add similar code to revert from 0 back to 0700 because that
was working fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> 'prairiedog' is also unhappy, and it looks related:

Yeah, gaur also failed in the same place.  Both of those are
alignment-picky 32-bit hardware, so I'm thinking the problem is
pg_gmtime() trying to fetch a 64-bit pg_time_t from an insufficiently
aligned address.  I'm trying to confirm that on gaur's host right now,
but it's a slow machine ...

> 'serinus' is also failing. This is less obviously related:

Dunno about this one.

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Interestingly, on my machine, rmtree coped with a mode 0 directory
> just fine, but mode 0400 was more than its tiny brain could handle, so
> the originally committed fix had code to revert 0400 back to 0700, but
> I didn't add similar code to revert from 0 back to 0700 because that
> was working fine.

It seems really odd that an implementation could cope with mode-0
but not mode-400.  Not sure I care enough to dig into the Perl
library code, though.

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 9:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > 'prairiedog' is also unhappy, and it looks related:
>
> Yeah, gaur also failed in the same place.  Both of those are
> alignment-picky 32-bit hardware, so I'm thinking the problem is
> pg_gmtime() trying to fetch a 64-bit pg_time_t from an insufficiently
> aligned address.  I'm trying to confirm that on gaur's host right now,
> but it's a slow machine ...

You might just want to wait until tomorrow and see whether it clears
up in newer runs. I just pushed yet another fix that might be
relevant.

I think I've done about as much as I can do for tonight, though. Most
things are green now, and the ones that aren't are failing because of
stuff that is at least plausibly fixed. By morning it should be
clearer how much broken stuff is left, although that will be somewhat
complicated by at least sidewinder and seawasp needing manual
intervention to get back on track.

I apologize to everyone who has been or will be inconvenienced by all
of this. So far I've pushed 4 test case fixes, 2 bug fixes, and 1
makefile fix, which I'm pretty sure is over quota for one patch. :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Andres Freund
Date:
Hi,

Peter, Petr, CCed you because it's probably a bug somewhere around the
initial copy code for logical replication.


On 2020-04-03 20:48:09 -0400, Robert Haas wrote:
> 'serinus' is also failing. This is less obviously related:

Hm. Tests passed once since then.


> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
> replication command: CREATE_REPLICATION_SLOT
> "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
> slot "tap_sub_16390_sync_16384" already exists

That already seems suspicious. I checked the following (successful) run
and I did not see that in the stage's logs.

Looking at the failing log, it fails because for some reason there's
rounds (once due to a refresh, once due to an intention replication
failure) of copying the relation. Each creates its own temporary slot.

first time:
2020-04-04 02:08:57.276 CEST [5e87d019.506bd:1] LOG:  connection received: host=[local]
2020-04-04 02:08:57.278 CEST [5e87d019.506bd:4] LOG:  received replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384"TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
 
2020-04-04 02:08:57.282 CEST [5e87d019.506bd:9] LOG:  statement: COPY public.tab_rep TO STDOUT
2020-04-04 02:08:57.284 CEST [5e87d019.506bd:10] LOG:  disconnection: session time: 0:00:00.007 user=bf
database=postgreshost=[local]
 

second time:
2020-04-04 02:08:57.288 CEST [5e87d019.506bf:1] LOG:  connection received: host=[local]
2020-04-04 02:08:57.289 CEST [5e87d019.506bf:4] LOG:  received replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384"TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
 
2020-04-04 02:08:57.293 CEST [5e87d019.506bf:9] LOG:  statement: COPY public.tab_rep TO STDOUT

third time:
2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection received: host=[local]
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384"TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
 
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication slot "tap_sub_16390_sync_16384" already exists

Note that the connection from the second attempt has not yet
disconnected. Hence the error about the replication slot already
existing - it's a temporary replication slot that'd otherwise already
have been dropped.


Seems the logical rep code needs to do something about this race?


About the assertion failure:

TRAP: FailedAssertion("owner->bufferarr.nitems == 0", File:
"/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/../pgsql/src/backend/utils/resowner/resowner.c",Line: 718)
 
postgres: publisher: walsender bf [local] idle in transaction (aborted)(ExceptionalCondition+0x5c)[0x9a13ac]
postgres: publisher: walsender bf [local] idle in transaction (aborted)(ResourceOwnerDelete+0x295)[0x9db8e5]
postgres: publisher: walsender bf [local] idle in transaction (aborted)[0x54c61f]
postgres: publisher: walsender bf [local] idle in transaction (aborted)(AbortOutOfAnyTransaction+0x122)[0x550e32]
postgres: publisher: walsender bf [local] idle in transaction (aborted)[0x9b3bc9]
postgres: publisher: walsender bf [local] idle in transaction (aborted)(shmem_exit+0x35)[0x80db45]
postgres: publisher: walsender bf [local] idle in transaction (aborted)[0x80dc77]
postgres: publisher: walsender bf [local] idle in transaction (aborted)(proc_exit+0x8)[0x80dd08]
postgres: publisher: walsender bf [local] idle in transaction (aborted)(PostgresMain+0x59f)[0x83bd0f]
postgres: publisher: walsender bf [local] idle in transaction (aborted)[0x7a0264]
postgres: publisher: walsender bf [local] idle in transaction (aborted)(PostmasterMain+0xbfc)[0x7a2b8c]
postgres: publisher: walsender bf [local] idle in transaction (aborted)(main+0x6fb)[0x49749b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fc52d83bbbb]
postgres: publisher: walsender bf [local] idle in transaction (aborted)(_start+0x2a)[0x49753a]
2020-04-04 02:08:57.302 CEST [5e87d018.5066b:4] LOG:  server process (PID 329409) was terminated by signal 6: Aborted

Due to the log_line_prefix used, I was at first not entirely sure the
backend that crashed was the one with the ERROR. But it appears we print
the pid as hex for '%c' (why?), so it indeed is the one.


I, again, have to say that the amount of stuff that was done as part of

commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
Author: Peter Eisentraut <peter_e@gmx.net>
Date:   2017-03-23 08:36:36 -0400

    Logical replication support for initial data copy

is insane. Adding support for running sql over replication connections
and extending CREATE_REPLICATION_SLOT with new options (without even
mentioning that in the commit message!) as part of a commit described as
"Logical replication support for initial data copy" shouldn't happen.


It's not obvious to me what buffer pins could be held at this point. I
wonder if this could be somehow related to

commit 3cb646264e8ced9f25557ce271284da512d92043
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2018-07-18 12:15:16 -0400

    Use a ResourceOwner to track buffer pins in all cases.
...
    In passing, remove some other ad-hoc resource owner creations that had
    gotten cargo-culted into various other places.  As far as I can tell
    that was all unnecessary, and if it had been necessary it was incomplete,
    due to lacking any provision for clearing those resowners later.
    (Also worth noting in this connection is that a process that hasn't called
    InitBufferPoolBackend has no business accessing buffers; so there's more
    to do than just add the resowner if we want to touch buffers in processes
    not covered by this patch.)

which removed the resowner previously used in walsender. At the very
least we should remove the SavedResourceOwnerDuringExport dance that's
still done in snapbuild.c.  But it can't really be at fault here,
because the crashing backend won't have used that.


So I'm a bit confused here. The best approach is probably to try to
reproduce this by adding an artifical delay into backend shutdown.


> (I still really dislike the fact that we have this evil hack allowing
> one connection to mix and match those sets of commands...)

FWIW, I think the opposite. We should get rid of the difference as much
as possible.

Greetings,

Andres Freund



Re: backup manifests and contemporaneous buildfarm failures

From
Petr Jelinek
Date:
On 04/04/2020 05:06, Andres Freund wrote:
> Hi,
> 
> Peter, Petr, CCed you because it's probably a bug somewhere around the
> initial copy code for logical replication.
> 
> 
> On 2020-04-03 20:48:09 -0400, Robert Haas wrote:
>> 'serinus' is also failing. This is less obviously related:
> 
> Hm. Tests passed once since then.
> 
> 
>> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
>> replication command: CREATE_REPLICATION_SLOT
>> "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
>> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
>> slot "tap_sub_16390_sync_16384" already exists
> 
> That already seems suspicious. I checked the following (successful) run
> and I did not see that in the stage's logs.
> 
> Looking at the failing log, it fails because for some reason there's
> rounds (once due to a refresh, once due to an intention replication
> failure) of copying the relation. Each creates its own temporary slot.
> 
> first time:
> 2020-04-04 02:08:57.276 CEST [5e87d019.506bd:1] LOG:  connection received: host=[local]
> 2020-04-04 02:08:57.278 CEST [5e87d019.506bd:4] LOG:  received replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384"TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
 
> 2020-04-04 02:08:57.282 CEST [5e87d019.506bd:9] LOG:  statement: COPY public.tab_rep TO STDOUT
> 2020-04-04 02:08:57.284 CEST [5e87d019.506bd:10] LOG:  disconnection: session time: 0:00:00.007 user=bf
database=postgreshost=[local]
 
> 
> second time:
> 2020-04-04 02:08:57.288 CEST [5e87d019.506bf:1] LOG:  connection received: host=[local]
> 2020-04-04 02:08:57.289 CEST [5e87d019.506bf:4] LOG:  received replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384"TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
 
> 2020-04-04 02:08:57.293 CEST [5e87d019.506bf:9] LOG:  statement: COPY public.tab_rep TO STDOUT
> 
> third time:
> 2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection received: host=[local]
> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384"TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
 
> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication slot "tap_sub_16390_sync_16384" already exists
> 
> Note that the connection from the second attempt has not yet
> disconnected. Hence the error about the replication slot already
> existing - it's a temporary replication slot that'd otherwise already
> have been dropped.
> 
> 
> Seems the logical rep code needs to do something about this race?
> 

The downstream:

> 2020-04-04 02:08:57.275 CEST [5e87d019.506bc:1] LOG:  logical replication table synchronization worker for
subscription"tap_sub", table "tab_rep" has started
 
> 2020-04-04 02:08:57.282 CEST [5e87d019.506bc:2] ERROR:  duplicate key value violates unique constraint
"tab_rep_pkey"
> 2020-04-04 02:08:57.282 CEST [5e87d019.506bc:3] DETAIL:  Key (a)=(1) already exists.
> 2020-04-04 02:08:57.282 CEST [5e87d019.506bc:4] CONTEXT:  COPY tab_rep, line 1
> 2020-04-04 02:08:57.283 CEST [5e87d018.50689:5] LOG:  background worker "logical replication worker" (PID 329404)
exitedwith exit code 1
 
> 2020-04-04 02:08:57.287 CEST [5e87d019.506be:1] LOG:  logical replication table synchronization worker for
subscription"tap_sub", table "tab_rep" has started
 
> 2020-04-04 02:08:57.293 CEST [5e87d019.506be:2] ERROR:  duplicate key value violates unique constraint
"tab_rep_pkey"
> 2020-04-04 02:08:57.293 CEST [5e87d019.506be:3] DETAIL:  Key (a)=(1) already exists.
> 2020-04-04 02:08:57.293 CEST [5e87d019.506be:4] CONTEXT:  COPY tab_rep, line 1
> 2020-04-04 02:08:57.295 CEST [5e87d018.50689:6] LOG:  background worker "logical replication worker" (PID 329406)
exitedwith exit code 1
 
> 2020-04-04 02:08:57.297 CEST [5e87d019.506c0:1] LOG:  logical replication table synchronization worker for
subscription"tap_sub", table "tab_rep" has started
 
> 2020-04-04 02:08:57.299 CEST [5e87d019.506c0:2] ERROR:  could not create replication slot "tap_sub_16390_sync_16384":
ERROR: replication slot "tap_sub_16390_sync_16384" already exists
 
> 2020-04-04 02:08:57.300 CEST [5e87d018.50689:7] LOG:  background worker "logical replication worker" (PID 329408)
exitedwith exit code 
 

Looks like we are simply retrying so fast that upstream will not have 
finished cleanup after second try by the time we already run the third one.

The last_start_times is supposed to protect against that so I guess 
there is some issue with how that works.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:
> On 2020-04-03 20:48:09 -0400, Robert Haas wrote:
> > 'serinus' is also failing. This is less obviously related:
>
> Hm. Tests passed once since then.

Yeah, but conchuela also failed once in what I think was a similar
way. I suspect the fix I pushed last night
(3e0d80fd8d3dd4f999e0d3aa3e591f480d8ad1fd) may have been enough to
clear this up.

> That already seems suspicious. I checked the following (successful) run
> and I did not see that in the stage's logs.

Yeah, the behavior of the test case doesn't seem to be entirely deterministic.

> I, again, have to say that the amount of stuff that was done as part of
>
> commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
> Author: Peter Eisentraut <peter_e@gmx.net>
> Date:   2017-03-23 08:36:36 -0400
>
>     Logical replication support for initial data copy
>
> is insane. Adding support for running sql over replication connections
> and extending CREATE_REPLICATION_SLOT with new options (without even
> mentioning that in the commit message!) as part of a commit described as
> "Logical replication support for initial data copy" shouldn't happen.

I agreed then and still do.

> So I'm a bit confused here. The best approach is probably to try to
> reproduce this by adding an artifical delay into backend shutdown.

I was able to reproduce an assertion failure by starting a
transaction, running a replication command that failed, and then
exiting the backend. 3e0d80fd8d3dd4f999e0d3aa3e591f480d8ad1fd made
that go away. I had wrongly assumed that there was no other way for a
walsender to have a ResourceOwner, and in the face of SQL commands
also being executed by walsenders, that's clearly not true. I'm not
sure *precisely* how that lead to the BF failures, but it was really
clear that it was wrong.

> > (I still really dislike the fact that we have this evil hack allowing
> > one connection to mix and match those sets of commands...)
>
> FWIW, I think the opposite. We should get rid of the difference as much
> as possible.

Well, that's another approach. It's OK to have one system and it's OK
to have two systems, but one and a half is not ideal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, some of the buildfarm is showing a simpler portability problem:
> they think you were too cavalier about the difference between time_t
> and pg_time_t.  (On a platform with 32-bit time_t, that's an actual
> bug, probably.)  lapwing is actually failing:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2020-04-03%2021%3A41%3A49
>
> ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla
-Wendif-labels-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g
-O2-Werror -I. -I. -I../../../src/include  -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE
-I/usr/include/libxml2 -I/usr/include/et  -c -o basebackup.o basebackup.c 
> basebackup.c: In function 'AddFileToManifest':
> basebackup.c:1199:10: error: passing argument 1 of 'pg_gmtime' from incompatible pointer type [-Werror]
> In file included from ../../../src/include/access/xlog_internal.h:26:0,
>                  from basebackup.c:20:
> ../../../src/include/pgtime.h:49:22: note: expected 'const pg_time_t *' but argument is of type 'time_t *'
> cc1: all warnings being treated as errors
> make[3]: *** [basebackup.o] Error 1
>
> but some others are showing it as a warning.
>
> I suppose that judicious s/time_t/pg_time_t/ would fix this.

I think you sent this email just after I pushed
db1531cae00941bfe4f6321fdef1e1ef355b6bed, or maybe after I'd committed
it locally and just before I pushed it. If you prefer a different fix
than what I did there, I can certainly whack it around some more.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 10:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I think I've done about as much as I can do for tonight, though. Most
> things are green now, and the ones that aren't are failing because of
> stuff that is at least plausibly fixed. By morning it should be
> clearer how much broken stuff is left, although that will be somewhat
> complicated by at least sidewinder and seawasp needing manual
> intervention to get back on track.

Taking stock of the situation this morning, most of the buildfarm is
now green. There are three failures, on eelpout (6 hours ago),
fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).

eelpout is unhappy because:

+WARNING:  could not remove shared memory segment
"/PostgreSQL.248989127": No such file or directory
+WARNING:  could not remove shared memory segment
"/PostgreSQL.1450751626": No such file or directory
  multibatch
 ------------
  f
@@ -861,22 +863,15 @@

 select length(max(s.t))
 from wide left join (select id, coalesce(t, '') || '' as t from wide)
s using (id);
- length
---------
- 320000
-(1 row)
-
+ERROR:  could not open shared memory segment "/PostgreSQL.605707657":
No such file or directory
+CONTEXT:  parallel worker

I'm not sure what caused that exactly, but it sorta looks like
operator intervention. Thomas, any ideas?

fairywren's last run was on 21dc488, and commit
460314db08e8688e1a54a0a26657941e058e45c5 was an attempt to fix what
broken there. I guess we'll find out whether that worked the next time
it runs.

hyrax's last run was before any of this happened, so it seems to have
an unrelated problem. The last two runs, three and six days ago, both
failed like this:

-ERROR:  stack depth limit exceeded
+ERROR:  stack depth limit exceeded at character 8

Not sure what that's about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 3, 2020 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I suppose that judicious s/time_t/pg_time_t/ would fix this.

> I think you sent this email just after I pushed
> db1531cae00941bfe4f6321fdef1e1ef355b6bed, or maybe after I'd committed
> it locally and just before I pushed it. If you prefer a different fix
> than what I did there, I can certainly whack it around some more.

Yeah, that commit showed up moments after I sent this.  Your fix
seems fine -- at least prairiedog and gaur are OK with it.
(I did verify that gaur was reproducibly crashing at that new
pg_strftime call, so we know it was that and not some on-again-
off-again issue.)

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> hyrax's last run was before any of this happened, so it seems to have
> an unrelated problem. The last two runs, three and six days ago, both
> failed like this:

> -ERROR:  stack depth limit exceeded
> +ERROR:  stack depth limit exceeded at character 8

> Not sure what that's about.

What it looks like is that hyrax is managing to detect stack overflow
at a point where an errcontext callback is active that adds an error
cursor to the failure.

It's not so surprising that we could get a different result that way
from a CLOBBER_CACHE_ALWAYS animal like hyrax, since CCA-forced
cache reloads would cause extra stack expenditure at a lot of places.
And it could vary depending on totally random details, like the number
of local variables in seemingly unrelated code.  What is odd is that
(AFAIR) we've never seen this before.  Maybe somebody recently added
an error cursor callback in a place that didn't have it before, and
is involved in SQL-function processing?  None of the commits leading
up to the earlier failure look promising for that, though.

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Sat, Apr 4, 2020 at 10:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's not so surprising that we could get a different result that way
> from a CLOBBER_CACHE_ALWAYS animal like hyrax, since CCA-forced
> cache reloads would cause extra stack expenditure at a lot of places.
> And it could vary depending on totally random details, like the number
> of local variables in seemingly unrelated code.

Oh, yeah. That's unfortunate.

> What is odd is that
> (AFAIR) we've never seen this before.  Maybe somebody recently added
> an error cursor callback in a place that didn't have it before, and
> is involved in SQL-function processing?  None of the commits leading
> up to the earlier failure look promising for that, though.

The relevant range of commits (e8b1774fc2 to a7b9d24e4e) includes an
ereport change (bda6dedbea) and a couple of "simple expression"
changes (8f59f6b9c0, fbc7a71608) but I don't know exactly why they
would have caused this. It seems at least possible, though, that
changing the return type of functions involved in error reporting
would slightly change the amount of stack space used; and the others
are related to SQL-function processing. Other than experimenting on
that machine, I'm not sure how we could really determine the relevant
factors here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Apr 4, 2020 at 10:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What is odd is that
>> (AFAIR) we've never seen this before.  Maybe somebody recently added
>> an error cursor callback in a place that didn't have it before, and
>> is involved in SQL-function processing?  None of the commits leading
>> up to the earlier failure look promising for that, though.

> The relevant range of commits (e8b1774fc2 to a7b9d24e4e) includes an
> ereport change (bda6dedbea) and a couple of "simple expression"
> changes (8f59f6b9c0, fbc7a71608) but I don't know exactly why they
> would have caused this.

When I first noticed hyrax's failure, some days ago, I immediately
thought of the "simple expression" patch.  But that should not have
affected SQL-function processing in any way: the bulk of the changes
were in plpgsql, and even the changes in plancache could not be
relevant, because functions.c does not use the plancache.

As for ereport, you'd think that that would only matter once you were
already doing an ereport.  The point at which the stack overflow
check triggers should be in normal code, not error recovery.

> It seems at least possible, though, that
> changing the return type of functions involved in error reporting
> would slightly change the amount of stack space used;

Right, but if it's down to that sort of phase-of-the-moon codegen
difference, you'd think this failure would have been coming and
going for years.  I still suppose that some fairly recent change
must be contributing to this, but haven't had time to investigate.

> Other than experimenting on
> that machine, I'm not sure how we could really determine the relevant
> factors here.

We don't have a lot of CCA buildfarm machines, so I'm suspecting that
it's probably not that hard to repro if you build with CCA.

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Thomas Munro
Date:
On Sun, Apr 5, 2020 at 2:36 AM Robert Haas <robertmhaas@gmail.com> wrote:
> eelpout is unhappy because:
>
> +WARNING:  could not remove shared memory segment
> "/PostgreSQL.248989127": No such file or directory
> +WARNING:  could not remove shared memory segment
> "/PostgreSQL.1450751626": No such file or directory

Seems to have fixed itself while I was sleeping. I did happen run
apt-get upgrade on that box some time yesterday-ish, but I don't
understand what mechanism would trash my /dev/shm in that process.
/me eyes systemd with suspicion



Re: backup manifests and contemporaneous buildfarm failures

From
Mikael Kjellström
Date:
On 2020-04-04 04:43, Robert Haas wrote:

> I think I've done about as much as I can do for tonight, though. Most
> things are green now, and the ones that aren't are failing because of
> stuff that is at least plausibly fixed. By morning it should be
> clearer how much broken stuff is left, although that will be somewhat
> complicated by at least sidewinder and seawasp needing manual
> intervention to get back on track.

I fixed sidewinder I think.  Should clear up the next time it runs.

It was the mode on the directory it couldn't handle-  A regular rm -rf 
didn't work I had to do a chmod -R 700 on all directories to be able to 
manually remove it.

/Mikael



Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-04-03 15:22:23 -0400, Robert Haas wrote:
> I've pushed all the patches.

Seeing new warnings in an optimized build

/home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c: In function 'json_manifest_object_end':
/home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c:591:2: warning: 'end_lsn' may be used
uninitializedin this function [-Wmaybe-uninitialized]
 
  591 |  context->perwalrange_cb(context, tli, start_lsn, end_lsn);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c:567:5: note: 'end_lsn' was declared here
  567 |     end_lsn;
      |     ^~~~~~~
/home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c:591:2: warning: 'start_lsn' may be used
uninitializedin this function [-Wmaybe-uninitialized]
 
  591 |  context->perwalrange_cb(context, tli, start_lsn, end_lsn);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c:566:13: note: 'start_lsn' was declared
here
  566 |  XLogRecPtr start_lsn,
      |             ^~~~~~~~~

The warnings don't seem too unreasonable. The compiler can't see that
the error_cb inside json_manifest_parse_failure() is not expected to
return. Probably worth adding a wrapper around the calls to
context->error_cb and mark that as noreturn.

- Andres



Re: backup manifests and contemporaneous buildfarm failures

From
Andrew Dunstan
Date:
On 4/5/20 9:10 AM, Mikael Kjellström wrote:
> On 2020-04-04 04:43, Robert Haas wrote:
>
>> I think I've done about as much as I can do for tonight, though. Most
>> things are green now, and the ones that aren't are failing because of
>> stuff that is at least plausibly fixed. By morning it should be
>> clearer how much broken stuff is left, although that will be somewhat
>> complicated by at least sidewinder and seawasp needing manual
>> intervention to get back on track.
>
> I fixed sidewinder I think.  Should clear up the next time it runs.
>
> It was the mode on the directory it couldn't handle-  A regular rm -rf
> didn't work I had to do a chmod -R 700 on all directories to be able
> to manually remove it.
>
>


Hmm, the buildfarm client does this at the beginning of each run to
remove anything that might be left over from a previous run:


    rmtree("inst");
    rmtree("$pgsql") unless ($from_source && !$use_vpath);


Do I need to precede those with some recursive chmod commands? Perhaps
the client should refuse to run if there is still something left after
these.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> Hmm, the buildfarm client does this at the beginning of each run to
> remove anything that might be left over from a previous run:

>     rmtree("inst");
>     rmtree("$pgsql") unless ($from_source && !$use_vpath);

Right, the point is precisely that some versions of rmtree() fail
to remove a mode-0 subdirectory.

> Do I need to precede those with some recursive chmod commands? Perhaps
> the client should refuse to run if there is still something left after
> these.

I think the latter would be a very good idea, just so that this sort of
failure is less obscure.  Not sure about whether a recursive chmod is
really going to be worth the cycles.  (On the other hand, the normal
case should be that there's nothing there anyway, so maybe it's not
going to be costly.)  

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Fabien COELHO
Date:
Hello,

>> Do I need to precede those with some recursive chmod commands? Perhaps
>> the client should refuse to run if there is still something left after
>> these.
>
> I think the latter would be a very good idea, just so that this sort of
> failure is less obscure.  Not sure about whether a recursive chmod is
> really going to be worth the cycles.  (On the other hand, the normal
> case should be that there's nothing there anyway, so maybe it's not
> going to be costly.)

Could it be a two-stage process to minimize cost but still be resilient?

   rmtree
   if (-d $DIR) {
     emit warning
     chmodtree
     rmtree again
     if (-d $DIR)
       emit error
   }

-- 
Fabien.



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Sun, Apr 5, 2020 at 4:07 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> Do I need to precede those with some recursive chmod commands?

+1.

> Perhaps
> the client should refuse to run if there is still something left after
> these.

+1 to that, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Andrew Dunstan
Date:
On 4/6/20 7:53 AM, Robert Haas wrote:
> On Sun, Apr 5, 2020 at 4:07 PM Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> Do I need to precede those with some recursive chmod commands?
> +1.
>
>> Perhaps
>> the client should refuse to run if there is still something left after
>> these.
> +1 to that, too.
>


See
https://github.com/PGBuildFarm/client-code/commit/0ef76bb1e2629713898631b9a3380d02d41c60ad


This will be in the next release, probably fairly soon.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Taking stock of the situation this morning, most of the buildfarm is
> now green. There are three failures, on eelpout (6 hours ago),
> fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).

fairywren has now done this twice in the pg_validatebackupCheck step:

exec failed: Bad address at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
 at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.

I'm a tad suspicious that it needs another perl2host()
somewhere, but the log isn't very clear as to where.

More generally, I wonder if we ought to be trying to
centralize those perl2host() calls instead of sticking
them into individual test cases.

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Andrew Dunstan
Date:
On Mon, Apr 6, 2020 at 1:18 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello,
>
> >> Do I need to precede those with some recursive chmod commands? Perhaps
> >> the client should refuse to run if there is still something left after
> >> these.
> >
> > I think the latter would be a very good idea, just so that this sort of
> > failure is less obscure.  Not sure about whether a recursive chmod is
> > really going to be worth the cycles.  (On the other hand, the normal
> > case should be that there's nothing there anyway, so maybe it's not
> > going to be costly.)
>
> Could it be a two-stage process to minimize cost but still be resilient?
>
>    rmtree
>    if (-d $DIR) {
>      emit warning
>      chmodtree
>      rmtree again
>      if (-d $DIR)
>        emit error
>    }
>


I thought about doing that. However, it's not really necessary. In the
normal course of events these directories should have been removed at
the end of the previous run, so we're only dealing with exceptional
cases here.

cheers

andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backup manifests and contemporaneous buildfarm failures

From
Andrew Dunstan
Date:
On Tue, Apr 7, 2020 at 12:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > Taking stock of the situation this morning, most of the buildfarm is
> > now green. There are three failures, on eelpout (6 hours ago),
> > fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).
>
> fairywren has now done this twice in the pg_validatebackupCheck step:
>
> exec failed: Bad address at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
>  at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
>
> I'm a tad suspicious that it needs another perl2host()
> somewhere, but the log isn't very clear as to where.
>
> More generally, I wonder if we ought to be trying to
> centralize those perl2host() calls instead of sticking
> them into individual test cases.
>
>


Not sure about that. I'll see if I can run it by hand and get some
more info. What's quite odd is that jacana (a very similar setup) is
passing this happily.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backup manifests

From
Fujii Masao
Date:

On 2020/04/04 4:22, Robert Haas wrote:
> On Thu, Apr 2, 2020 at 4:34 PM David Steele <david@pgmasters.net> wrote:
>> +1. These would be great tests to have and a win for pg_basebackup
>> overall but I don't think they should be a prerequisite for this commit.
> 
> Not to mention the server. I can't say that I have a lot of confidence
> that all of the server behavior in this area is well-understood and
> sane.
> 
> I've pushed all the patches.

When there is a backup_manifest in the database cluster, it's included in
the backup even when --no-manifest is specified. ISTM that this is problematic
because the backup_manifest is obviously not valid for the backup.
So, isn't it better to always exclude the *existing* backup_manifest in the
cluster from the backup, like backup_label/tablespace_map? Patch attached.

Also I found the typo in the document. Patch attached.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> When there is a backup_manifest in the database cluster, it's included in
> the backup even when --no-manifest is specified. ISTM that this is problematic
> because the backup_manifest is obviously not valid for the backup.
> So, isn't it better to always exclude the *existing* backup_manifest in the
> cluster from the backup, like backup_label/tablespace_map? Patch attached.
>
> Also I found the typo in the document. Patch attached.

Both patches look good. The second one is definitely a mistake on my
part, and the first one seems like a totally reasonable change.
Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Andrew Dunstan
Date:
On 4/7/20 9:42 AM, Andrew Dunstan wrote:
> On Tue, Apr 7, 2020 at 12:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Taking stock of the situation this morning, most of the buildfarm is
>>> now green. There are three failures, on eelpout (6 hours ago),
>>> fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).
>> fairywren has now done this twice in the pg_validatebackupCheck step:
>>
>> exec failed: Bad address at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
>>  at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
>>
>> I'm a tad suspicious that it needs another perl2host()
>> somewhere, but the log isn't very clear as to where.
>>
>> More generally, I wonder if we ought to be trying to
>> centralize those perl2host() calls instead of sticking
>> them into individual test cases.
>>
>>
>
> Not sure about that. I'll see if I can run it by hand and get some
> more info. What's quite odd is that jacana (a very similar setup) is
> passing this happily.
>


OK, tricky, but here's what I did to get this working on fairywren.


First, on Msys2 there is a problem with name mangling. We've had to fix
this before by telling it to ignore certain argument prefixes.


Second, once that was fixed rmdir was failing on the tablespace. On
Windows this is a junction, so unlink is the correct thing to do, I
believe, just as it is on Unix where it's a symlink.


cheers


andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> OK, tricky, but here's what I did to get this working on fairywren.
> First, on Msys2 there is a problem with name mangling. We've had to fix
> this before by telling it to ignore certain argument prefixes.
> Second, once that was fixed rmdir was failing on the tablespace. On
> Windows this is a junction, so unlink is the correct thing to do, I
> believe, just as it is on Unix where it's a symlink.

Hmm, no opinion about the name mangling business, but the other part
seems like it might break jacana and/or bowerbird, which are currently
happy with this test?  (AFAICS we only have four Windows animals
running the TAP tests, and the fourth (drongo) hasn't reported in
for awhile.)

I guess we could commit it and find out.  I'm all for the simpler
coding if it works.

            regards, tom lane



Re: backup manifests and contemporaneous buildfarm failures

From
Robert Haas
Date:
On Wed, Apr 8, 2020 at 1:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I guess we could commit it and find out.  I'm all for the simpler
> coding if it works.

I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} =
$source_ts_prefix does, but the remove/unlink condition was suggested
by Amit Kapila on the basis of testing on his Windows development
environment, so I suspect that's actually needed on at least some
systems. I just work here, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests and contemporaneous buildfarm failures

From
Andrew Dunstan
Date:
On 4/8/20 3:41 PM, Robert Haas wrote:
> On Wed, Apr 8, 2020 at 1:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I guess we could commit it and find out.  I'm all for the simpler
>> coding if it works.
> I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} =
> $source_ts_prefix does, 


You don't want to know ....


See <https://www.msys2.org/wiki/Porting/#filesystem-namespaces> for the
gory details.


It's the tablespace map parameter that is upsetting it.



> but the remove/unlink condition was suggested
> by Amit Kapila on the basis of testing on his Windows development
> environment, so I suspect that's actually needed on at least some
> systems. I just work here, though.
>

Yeah, drongo doesn't like it, so we'll have to tweak the logic.


I'll update after some more testing.


cheers


andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backup manifests and contemporaneous buildfarm failures

From
Tom Lane
Date:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 4/8/20 3:41 PM, Robert Haas wrote:
>> I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} =
>> $source_ts_prefix does, 

> You don't want to know ....
> See <https://www.msys2.org/wiki/Porting/#filesystem-namespaces> for the
> gory details.

I don't want to know either, but maybe that reference should be cited
somewhere near where we use this sort of hack.

            regards, tom lane



Re: backup manifests

From
Fujii Masao
Date:

On 2020/04/09 2:35, Robert Haas wrote:
> On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> When there is a backup_manifest in the database cluster, it's included in
>> the backup even when --no-manifest is specified. ISTM that this is problematic
>> because the backup_manifest is obviously not valid for the backup.
>> So, isn't it better to always exclude the *existing* backup_manifest in the
>> cluster from the backup, like backup_label/tablespace_map? Patch attached.
>>
>> Also I found the typo in the document. Patch attached.
> 
> Both patches look good. The second one is definitely a mistake on my
> part, and the first one seems like a totally reasonable change.
> Thanks!

Thanks for reviewing them! I pushed them.

Please note that the commit messages have not been delivered to
pgsql-committers yet.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: backup manifests

From
Stephen Frost
Date:
Greetings,

* Fujii Masao (masao.fujii@oss.nttdata.com) wrote:
> On 2020/04/09 2:35, Robert Haas wrote:
> >On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>When there is a backup_manifest in the database cluster, it's included in
> >>the backup even when --no-manifest is specified. ISTM that this is problematic
> >>because the backup_manifest is obviously not valid for the backup.
> >>So, isn't it better to always exclude the *existing* backup_manifest in the
> >>cluster from the backup, like backup_label/tablespace_map? Patch attached.
> >>
> >>Also I found the typo in the document. Patch attached.
> >
> >Both patches look good. The second one is definitely a mistake on my
> >part, and the first one seems like a totally reasonable change.
> >Thanks!
>
> Thanks for reviewing them! I pushed them.
>
> Please note that the commit messages have not been delivered to
> pgsql-committers yet.

They've been released and your address whitelisted.

Thanks,

Stephen

Attachment

Re: backup manifests

From
Fujii Masao
Date:

On 2020/04/09 23:10, Stephen Frost wrote:
> Greetings,
> 
> * Fujii Masao (masao.fujii@oss.nttdata.com) wrote:
>> On 2020/04/09 2:35, Robert Haas wrote:
>>> On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>> When there is a backup_manifest in the database cluster, it's included in
>>>> the backup even when --no-manifest is specified. ISTM that this is problematic
>>>> because the backup_manifest is obviously not valid for the backup.
>>>> So, isn't it better to always exclude the *existing* backup_manifest in the
>>>> cluster from the backup, like backup_label/tablespace_map? Patch attached.
>>>>
>>>> Also I found the typo in the document. Patch attached.
>>>
>>> Both patches look good. The second one is definitely a mistake on my
>>> part, and the first one seems like a totally reasonable change.
>>> Thanks!
>>
>> Thanks for reviewing them! I pushed them.
>>
>> Please note that the commit messages have not been delivered to
>> pgsql-committers yet.
> 
> They've been released and your address whitelisted.

Many thanks!!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: backup manifests

From
Fujii Masao
Date:

On 2020/04/09 23:06, Fujii Masao wrote:
> 
> 
> On 2020/04/09 2:35, Robert Haas wrote:
>> On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> When there is a backup_manifest in the database cluster, it's included in
>>> the backup even when --no-manifest is specified. ISTM that this is problematic
>>> because the backup_manifest is obviously not valid for the backup.
>>> So, isn't it better to always exclude the *existing* backup_manifest in the
>>> cluster from the backup, like backup_label/tablespace_map? Patch attached.
>>>
>>> Also I found the typo in the document. Patch attached.
>>
>> Both patches look good. The second one is definitely a mistake on my
>> part, and the first one seems like a totally reasonable change.
>> Thanks!
> 
> Thanks for reviewing them! I pushed them.

I found other minor issues.

+          When this option is specified with a value of <literal>yes</literal>
+          or <literal>force-escape</literal>, a backup manifest is created

force-escape should be force-encode.
Patch attached.

-    while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+    while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPm:",

"m:" seems unnecessary, so should be removed?
Patch attached.

+    if (strcmp(basedir, "-") == 0)
+    {
+        char        header[512];
+        PQExpBufferData    buf;
+
+        initPQExpBuffer(&buf);
+        ReceiveBackupManifestInMemory(conn, &buf);

backup_manifest should be received only when the manifest is enabled,
so ISTM that the flag "manifest" should be checked in the above if-condition.
Thought? Patch attached.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: backup manifests

From
Michael Paquier
Date:
On Mon, Apr 13, 2020 at 11:09:34AM +0900, Fujii Masao wrote:
> -    while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
> +    while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPm:",
>
> "m:" seems unnecessary, so should be removed?
> Patch attached.

Smells like some remnant diff from a previous version.

> +    if (strcmp(basedir, "-") == 0)
> +    {
> +        char        header[512];
> +        PQExpBufferData    buf;
> +
> +        initPQExpBuffer(&buf);
> +        ReceiveBackupManifestInMemory(conn, &buf);
>
> backup_manifest should be received only when the manifest is enabled,
> so ISTM that the flag "manifest" should be checked in the above if-condition.
> Thought? Patch attached.
>
> -    if (strcmp(basedir, "-") == 0)
> +    if (strcmp(basedir, "-") == 0 && manifest)
>      {
>          char        header[512];
>          PQExpBufferData    buf;

Indeed.  Using the tar format with --no-manifest causes a failure:
pg_basebackup -D - --format=t --wal-method=none \
    --no-manifest > /dev/null

The doc changes look right to me.  Nice catches.
--
Michael

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> I found other minor issues.

I think these are all correct fixes. Thanks for the post-commit
review, and sorry for this mistakes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



documenting the backup manifest file format

From
Robert Haas
Date:
On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
> I don't like having a file format that's intended to be used by external
> tools too that's undocumented except for code that assembles it in a
> piecemeal fashion.  Do you mean in a follow-on patch this release, or
> later? I don't have a problem with the former.

Here is a patch for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: documenting the backup manifest file format

From
Justin Pryzby
Date:
On Mon, Apr 13, 2020 at 01:40:56PM -0400, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
> > I don't like having a file format that's intended to be used by external
> > tools too that's undocumented except for code that assembles it in a
> > piecemeal fashion.  Do you mean in a follow-on patch this release, or
> > later? I don't have a problem with the former.
> 
> Here is a patch for that.

typos:
manifes
hexademical (twice)

-- 
Justin



Re: documenting the backup manifest file format

From
Robert Haas
Date:
On Mon, Apr 13, 2020 at 1:55 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> typos:
> manifes
> hexademical (twice)

Thanks. v2 attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: documenting the backup manifest file format

From
Erik Rijkers
Date:
On 2020-04-13 20:08, Robert Haas wrote:
> [v2-0001-Document-the-backup-manifest-file-format.patch]

Can you double check this sentence?  Seems strange to me but I don't 
know why; it may well be that my english is not good enough.  Maybe a 
comma after 'required' makes reading easier?

    The timeline from which this range of WAL records will be required in
    order to make use of this backup. The value is an integer.


One typo:

'when making using'  should be
'when making use'



Erik Rijkers




Re: documenting the backup manifest file format

From
Alvaro Herrera
Date:
+      The LSN at which replay must begin on the indicated timeline in order to
+      make use of this backup.  The LSN is stored in the format normally used
+      by <productname>PostgreSQL</productname>; that is, it is a string
+      consisting of two strings of hexademical characters, each with a length
+      of between 1 and 8, separated by a slash.

typo "hexademical"

Are these hex figures upper or lower case?  No leading zeroes?  This
would normally not matter, but the toplevel checksum will care.  Also, I
see no mention of prettification-chars such as newlines or indentation.
I suppose if I pass a manifest file through prettification (or Windows
newline conversion), the checksum may break.

As for Last-Modification, I think the spec should indicate the exact
format that's used, because it'll also be critical for checksumming.

Why is the top-level checksum only allowed to be SHA-256, if the files
can use up to SHA-512?  (Also, did we intentionally omit the dash in
hash names, so "SHA-256" to make it SHA256?  This will also be critical
for checksumming the manifest itself.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: documenting the backup manifest file format

From
Robert Haas
Date:
On Mon, Apr 13, 2020 at 2:28 PM Erik Rijkers <er@xs4all.nl> wrote:
> Can you double check this sentence?  Seems strange to me but I don't
> know why; it may well be that my english is not good enough.  Maybe a
> comma after 'required' makes reading easier?
>
>     The timeline from which this range of WAL records will be required in
>     order to make use of this backup. The value is an integer.

It sounds a little awkward to me, but not outright wrong. I'm not
exactly sure how to rephrase it, though. Maybe just shorten it to "the
timeline for this range of WAL records"?

> One typo:
>
> 'when making using'  should be
> 'when making use'

Right, thanks, fixed in my local copy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: documenting the backup manifest file format

From
Andrew Dunstan
Date:
On 4/13/20 1:40 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
>> I don't like having a file format that's intended to be used by external
>> tools too that's undocumented except for code that assembles it in a
>> piecemeal fashion.  Do you mean in a follow-on patch this release, or
>> later? I don't have a problem with the former.
> Here is a patch for that.
>


Seems ok. A tiny example, or an excerpt, might be nice.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: documenting the backup manifest file format

From
Robert Haas
Date:
On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Are these hex figures upper or lower case?  No leading zeroes?  This
> would normally not matter, but the toplevel checksum will care.

Not really. You just feed the whole file except for the last line
through shasum and you get the answer.

It so happens that the server generates lower-case, but
pg_verifybackup will accept either.

Leading zeroes are not omitted. If the checksum's not the right
length, it ain't gonna work. If SHA is used, it's the same output you
would get from running shasum -a<whatever> on the file, which is
certainly a fixed length. I assumed that this followed from the
statement that there are two characters per byte in the checksum, and
from the fact that no checksum algorithm I know about drops leading
zeroes in the output.

> Also, I
> see no mention of prettification-chars such as newlines or indentation.
> I suppose if I pass a manifest file through prettification (or Windows
> newline conversion), the checksum may break.

It would indeed break. I'm not sure what you want me to say here,
though. If you're trying to parse a manifest, you shouldn't care about
how the whitespace is arranged. If you're trying to generate one, you
can arrange it any way you like, as long as you also include it in the
checksum.

> As for Last-Modification, I think the spec should indicate the exact
> format that's used, because it'll also be critical for checksumming.

Again, I don't think it really matters for checksumming, but it's
"YYYY-MM-DD HH:MM:SS TZ" format, where TZ is always GMT.

> Why is the top-level checksum only allowed to be SHA-256, if the files
> can use up to SHA-512?

If we allowed the top-level checksum to be changed to something else,
then we'd probably we want to indicate which kind of checksum is being
used at the beginning of the file, so as to enable incremental parsing
with checksum verification at the end. pg_verifybackup doesn't
currently do incremental parsing, but I'd like to add that sometime,
if I get time to hash out the details. I think the use case for
varying the checksum type of the manifest itself is much less than for
varying it for the files. The big problem with checksumming the files
is that it can be slow, because the files can be big. However, unless
you have a truckload of empty files in the database, the manifest is
going to be very small compared to the sizes of all the files, so it
seemed harmless to use a stronger checksum algorithm for the manifest
itself. Maybe someone with a ton of empty or nearly-empty relations
will complain, but they can always use --no-manifest if they want.

I agree that it's a little bit weird that you can have a stronger
checksum for the files instead of the manifest itself, but I also
wonder what the use case would be for using a stronger checksum on the
manifest. David Steele argued that strong checksums on the files could
be useful to software that wants to rifle through all the backups
you've ever taken and find another copy of that file by looking for
something with a matching checksum. CRC-32C wouldn't be strong enough
for that, because eventually you could have enough files that you
start to have collisions. The SHA algorithms output enough bits to
make that quite unlikely. But this argument only makes sense for the
files, not the manifest.

Naturally, all this is arguable, though, and a good deal of arguing
about it has been done, as you have probably noticed. I am still of
the opinion that if somebody's goal is to use this facility for its
intended purpose, which is to find out whether your backup got
corrupted, any of these algorithms are fine, and are highly likely to
tell you that you have a problem if, in fact, you do. In fact, I bet
that even a checksum algorithm considerably stupider than anything I'd
actually consider using would accomplish that goal in a high
percentage of cases. But not everybody agrees with me, to the point
where I am starting to wonder if I really understand how computers
work.

> (Also, did we intentionally omit the dash in
> hash names, so "SHA-256" to make it SHA256?  This will also be critical
> for checksumming the manifest itself.)

I debated this with myself, settled on this spelling, and nobody
complained until now. It could be changed, though. I didn't have any
particular reason for choosing it except the feeling that people would
probably prefer to type --manifest-checksum=sha256 rather than
--manifest-checksum=sha-256.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: documenting the backup manifest file format

From
Robert Haas
Date:
On Mon, Apr 13, 2020 at 4:10 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> Seems ok. A tiny example, or an excerpt, might be nice.

An empty database produces a manifest about 1200 lines long, so a full
example seems like too much to include in the documentation. An
excerpt could be included, I suppose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: documenting the backup manifest file format

From
David Steele
Date:
On 4/13/20 4:14 PM, Robert Haas wrote:
> On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> 
>> Also, I
>> see no mention of prettification-chars such as newlines or indentation.
>> I suppose if I pass a manifest file through prettification (or Windows
>> newline conversion), the checksum may break.
> 
> It would indeed break. I'm not sure what you want me to say here,
> though. If you're trying to parse a manifest, you shouldn't care about
> how the whitespace is arranged. If you're trying to generate one, you
> can arrange it any way you like, as long as you also include it in the
> checksum.

pgBackRest ignores whitespace but this is a legacy of the way Perl 
calculated checksums, not an intentional feature. This worked well when 
the manifest was loaded as a whole, converted to JSON, and checksummed, 
but it is a major pain for the streaming code we now have in C.

I guarantee that that our next manifest version will do a simple 
checksum of bytes as Robert has done in this feature.

So, I'm +1 as implemented.

>> Why is the top-level checksum only allowed to be SHA-256, if the files
>> can use up to SHA-512?

<snip>

> I agree that it's a little bit weird that you can have a stronger
> checksum for the files instead of the manifest itself, but I also
> wonder what the use case would be for using a stronger checksum on the
> manifest. David Steele argued that strong checksums on the files could
> be useful to software that wants to rifle through all the backups
> you've ever taken and find another copy of that file by looking for
> something with a matching checksum. CRC-32C wouldn't be strong enough
> for that, because eventually you could have enough files that you
> start to have collisions. The SHA algorithms output enough bits to
> make that quite unlikely. But this argument only makes sense for the
> files, not the manifest.

Agreed. I think SHA-256 is *more* than enough to protect the manifest 
against corruption. That said, since the cost of SHA-256 vs. SHA-512 in 
the context on the manifest is negligible we could just use the stronger 
algorithm to deflect a similar question going forward.

That choice might not age well, but we could always say, well, we picked 
it because it was the strongest available at the time. Allowing a choice 
of which algorithm to use for to manifest checksum seems like it will 
just make verifying the file harder with no tangible benefit.

Maybe just a comment in the docs about why SHA-256 was used would be fine.

>> (Also, did we intentionally omit the dash in
>> hash names, so "SHA-256" to make it SHA256?  This will also be critical
>> for checksumming the manifest itself.)
> 
> I debated this with myself, settled on this spelling, and nobody
> complained until now. It could be changed, though. I didn't have any
> particular reason for choosing it except the feeling that people would
> probably prefer to type --manifest-checksum=sha256 rather than
> --manifest-checksum=sha-256.

+1 for sha256 rather than sha-256.

Regards,
-- 
-David
david@pgmasters.net



Re: documenting the backup manifest file format

From
Alvaro Herrera
Date:
On 2020-Apr-13, Robert Haas wrote:

> On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Are these hex figures upper or lower case?  No leading zeroes?  This
> > would normally not matter, but the toplevel checksum will care.
> 
> Not really. You just feed the whole file except for the last line
> through shasum and you get the answer.
> 
> It so happens that the server generates lower-case, but
> pg_verifybackup will accept either.
> 
> Leading zeroes are not omitted. If the checksum's not the right
> length, it ain't gonna work. If SHA is used, it's the same output you
> would get from running shasum -a<whatever> on the file, which is
> certainly a fixed length. I assumed that this followed from the
> statement that there are two characters per byte in the checksum, and
> from the fact that no checksum algorithm I know about drops leading
> zeroes in the output.

Eh, apologies, I was completely unclear -- I was looking at the LSN
fields when writing the above.  So the leading zeroes and letter case
comment refers to those in the LSN values.  I agree that it doesn't
matter as long as the same tool generates the json file and writes the
checksum.

> > Also, I see no mention of prettification-chars such as newlines or
> > indentation.  I suppose if I pass a manifest file through
> > prettification (or Windows newline conversion), the checksum may
> > break.
> 
> It would indeed break. I'm not sure what you want me to say here,
> though. If you're trying to parse a manifest, you shouldn't care about
> how the whitespace is arranged. If you're trying to generate one, you
> can arrange it any way you like, as long as you also include it in the
> checksum.

Yeah, I guess I'm just saying that it feels brittle to have a file
format that's supposed to be good for data exchange and then make it
itself depend on representation details such as the order that fields
appear in, the letter case, or the format of newlines.  Maybe this isn't
really of concern, but it seemed strange.

> > As for Last-Modification, I think the spec should indicate the exact
> > format that's used, because it'll also be critical for checksumming.
> 
> Again, I don't think it really matters for checksumming, but it's
> "YYYY-MM-DD HH:MM:SS TZ" format, where TZ is always GMT.

I agree that whatever format you use will work as long as it isn't
modified.

I think strict ISO 8601 might be preferable (with the T in the middle
and ending in Z instead of " GMT").

> > Why is the top-level checksum only allowed to be SHA-256, if the
> > files can use up to SHA-512?

Thanks for the discussion.  I think you mostly want to make sure that
the manifest is sensible (not corrupt) rather than defend against
somebody maliciously giving you an attacking manifest (??).  I incline
to agree that any SHA-2 hash is going to serve that purpose and have no
further comment to make.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: documenting the backup manifest file format

From
Robert Haas
Date:
On Mon, Apr 13, 2020 at 5:43 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Yeah, I guess I'm just saying that it feels brittle to have a file
> format that's supposed to be good for data exchange and then make it
> itself depend on representation details such as the order that fields
> appear in, the letter case, or the format of newlines.  Maybe this isn't
> really of concern, but it seemed strange.

I didn't want to use JSON for this at all, but I got outvoted. When I
raised this issue, it was suggested that I deal with it in this way,
so I did. I can't really defend it too far beyond that, although I do
think that one nice thing about this is that you can verify the
checksum using shell commands if you want. Just figure out the number
of lines in the file, minus one, and do head -n$LINES backup_manifest
| shasum -a256 and boom. If there were some whitespace-skipping thing
figuring out how to reproduce the checksum calculation would be hard.

> I think strict ISO 8601 might be preferable (with the T in the middle
> and ending in Z instead of " GMT").

Hmm, did David suggest that before? I don't recall for sure. I think
he had some suggestion, but I'm not sure if it was the same one.

> > > Why is the top-level checksum only allowed to be SHA-256, if the
> > > files can use up to SHA-512?
>
> Thanks for the discussion.  I think you mostly want to make sure that
> the manifest is sensible (not corrupt) rather than defend against
> somebody maliciously giving you an attacking manifest (??).  I incline
> to agree that any SHA-2 hash is going to serve that purpose and have no
> further comment to make.

The code has other sanity checks against the manifest failing to parse
properly, so you can't (I hope) crash it or anything even if you
falsify the checksum. But suppose that there is a gremlin running
around your system flipping occasional bits. If said gremlin flips a
bit in a "0" that appears in a file's checksum string, it could become
a "1", a "3", or a "7", all of which are still valid characters for a
hex string. When you then tried to verify the backup, verification for
that file would fail, but you'd think it was a problem with the file,
rather than a problem with the manifest. The manifest checksum
prevents that: you'll get a complaint about the manifest checksum
being wrong rather than a complaint about the file not matching the
manifest checksum. A sufficiently smart gremlin could figure out the
expected checksum for the revised manifest and flip bits to make the
actual value match the expected one, but I think we're worried about
"chaotic neutral" gremlins, not "lawful evil" ones.

That having been said, there was some discussion on the original
thread about keeping your backup on regular storage and your manifest
checksum in a concrete bunker at the bottom of the ocean; in that
scenario, it should be possible to detect tampering in either the
manifest itself or in non-WAL data files, as long as the adversary
can't break SHA-256. But I'm not sure how much we should really worry
about that. For me, the design center for this feature is a user who
untars base.tar and forgets about 43965.tar. If that person runs
pg_verifybackup, it's gonna tell them that things are broken, and
that's good enough for me. It may not be good enough for everybody,
but it's good enough for me.

I think I'm going to go ahed and push this now, maybe with a small
wording tweak as discussed upthread with Andrew. The rest of this
discussion is really about whether the patch needs any design changes
rather than about whether the documentation describes what the patch
does, so it makes sense to me to commit this first and then if
somebody wants to argue for a change they certainly can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: documenting the backup manifest file format

From
David Steele
Date:
On 4/14/20 12:56 PM, Robert Haas wrote:
> On Mon, Apr 13, 2020 at 5:43 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Yeah, I guess I'm just saying that it feels brittle to have a file
>> format that's supposed to be good for data exchange and then make it
>> itself depend on representation details such as the order that fields
>> appear in, the letter case, or the format of newlines.  Maybe this isn't
>> really of concern, but it seemed strange.
> 
> I didn't want to use JSON for this at all, but I got outvoted. When I
> raised this issue, it was suggested that I deal with it in this way,
> so I did. I can't really defend it too far beyond that, although I do
> think that one nice thing about this is that you can verify the
> checksum using shell commands if you want. Just figure out the number
> of lines in the file, minus one, and do head -n$LINES backup_manifest
> | shasum -a256 and boom. If there were some whitespace-skipping thing
> figuring out how to reproduce the checksum calculation would be hard.
> 
>> I think strict ISO 8601 might be preferable (with the T in the middle
>> and ending in Z instead of " GMT").
> 
> Hmm, did David suggest that before? I don't recall for sure. I think
> he had some suggestion, but I'm not sure if it was the same one.

"I'm also partial to using epoch time in the manifest because it is 
generally easier for programs to work with.  But, human-readable doesn't 
suck, either."

Also you don't need to worry about time-zone conversion errors -- even 
if the source time is UTC this can easily happen if you are not careful. 
It also saves a parsing step.

The downside is it is not human-readable but this is intended to be a 
machine-readable format so I don't think it's a big deal (encoded 
filenames will be just as opaque). If a user really needs to know what 
time some file is (rare, I think) they can paste it with a web tool to 
find out.

Regards,
-- 
-David
david@pgmasters.net



Re: documenting the backup manifest file format

From
Alvaro Herrera
Date:
On 2020-Apr-14, David Steele wrote:

> On 4/14/20 12:56 PM, Robert Haas wrote:
>
> > Hmm, did David suggest that before? I don't recall for sure. I think
> > he had some suggestion, but I'm not sure if it was the same one.
> 
> "I'm also partial to using epoch time in the manifest because it is
> generally easier for programs to work with.  But, human-readable doesn't
> suck, either."

Ugh.  If you go down that road, why write human-readable contents at
all?  You may as well just use a binary format.  But that's a very
slippery slope and you won't like to be in the bottom -- I don't see
what that gains you.  It's not like it's a lot of work to parse a
timestamp in a non-internationalized well-defined human-readable format.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: documenting the backup manifest file format

From
David Steele
Date:
On 4/14/20 1:27 PM, Alvaro Herrera wrote:
> On 2020-Apr-14, David Steele wrote:
> 
>> On 4/14/20 12:56 PM, Robert Haas wrote:
>>
>>> Hmm, did David suggest that before? I don't recall for sure. I think
>>> he had some suggestion, but I'm not sure if it was the same one.
>>
>> "I'm also partial to using epoch time in the manifest because it is
>> generally easier for programs to work with.  But, human-readable doesn't
>> suck, either."
> 
> Ugh.  If you go down that road, why write human-readable contents at
> all?  You may as well just use a binary format.  But that's a very
> slippery slope and you won't like to be in the bottom -- I don't see
> what that gains you.  It's not like it's a lot of work to parse a
> timestamp in a non-internationalized well-defined human-readable format.

Well, times are a special case because they are so easy to mess up. Try 
converting ISO-8601 to epoch time using the standard C functions on a 
system where TZ != UTC. Fun times.

Regards,
-- 
-David
david@pgmasters.net



Re: documenting the backup manifest file format

From
Andrew Dunstan
Date:
On 4/14/20 1:33 PM, David Steele wrote:
> On 4/14/20 1:27 PM, Alvaro Herrera wrote:
>> On 2020-Apr-14, David Steele wrote:
>>
>>> On 4/14/20 12:56 PM, Robert Haas wrote:
>>>
>>>> Hmm, did David suggest that before? I don't recall for sure. I think
>>>> he had some suggestion, but I'm not sure if it was the same one.
>>>
>>> "I'm also partial to using epoch time in the manifest because it is
>>> generally easier for programs to work with.  But, human-readable
>>> doesn't
>>> suck, either."
>>
>> Ugh.  If you go down that road, why write human-readable contents at
>> all?  You may as well just use a binary format.  But that's a very
>> slippery slope and you won't like to be in the bottom -- I don't see
>> what that gains you.  It's not like it's a lot of work to parse a
>> timestamp in a non-internationalized well-defined human-readable format.
>
> Well, times are a special case because they are so easy to mess up.
> Try converting ISO-8601 to epoch time using the standard C functions
> on a system where TZ != UTC. Fun times.
>
>


Even if it's a zulu time? That would be pretty damn sad.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: documenting the backup manifest file format

From
David Steele
Date:
On 4/14/20 3:03 PM, Andrew Dunstan wrote:
> 
> On 4/14/20 1:33 PM, David Steele wrote:
>> On 4/14/20 1:27 PM, Alvaro Herrera wrote:
>>> On 2020-Apr-14, David Steele wrote:
>>>
>>>> On 4/14/20 12:56 PM, Robert Haas wrote:
>>>>
>>>>> Hmm, did David suggest that before? I don't recall for sure. I think
>>>>> he had some suggestion, but I'm not sure if it was the same one.
>>>>
>>>> "I'm also partial to using epoch time in the manifest because it is
>>>> generally easier for programs to work with.  But, human-readable
>>>> doesn't
>>>> suck, either."
>>>
>>> Ugh.  If you go down that road, why write human-readable contents at
>>> all?  You may as well just use a binary format.  But that's a very
>>> slippery slope and you won't like to be in the bottom -- I don't see
>>> what that gains you.  It's not like it's a lot of work to parse a
>>> timestamp in a non-internationalized well-defined human-readable format.
>>
>> Well, times are a special case because they are so easy to mess up.
>> Try converting ISO-8601 to epoch time using the standard C functions
>> on a system where TZ != UTC. Fun times.
> 
> Even if it's a zulu time? That would be pretty damn sad.
ZULU/GMT/UTC are all fine. But if the server timezone is EDT for example 
(not that I recommend this) you are likely to get the wrong result. 
Results vary based on your platform. For instance, we found MacOS was 
more likely to work the way you would expect and Linux was hopeless.

There are all kinds of fun tricks to get around this (sort of). One is 
to temporarily set TZ=UTC which sucks if an error happens before it gets 
set back. There are some hacks to try to determine your offset which 
have inherent race conditions around DST changes.

After some experimentation we just used the Posix definition for epoch 
time and used that to do our conversions:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_16

Regards,
-- 
-David
david@pgmasters.net



Re: documenting the backup manifest file format

From
Andrew Dunstan
Date:
On 4/14/20 3:19 PM, David Steele wrote:
> On 4/14/20 3:03 PM, Andrew Dunstan wrote:
>>
>> On 4/14/20 1:33 PM, David Steele wrote:
>>> On 4/14/20 1:27 PM, Alvaro Herrera wrote:
>>>> On 2020-Apr-14, David Steele wrote:
>>>>
>>>>> On 4/14/20 12:56 PM, Robert Haas wrote:
>>>>>
>>>>>> Hmm, did David suggest that before? I don't recall for sure. I think
>>>>>> he had some suggestion, but I'm not sure if it was the same one.
>>>>>
>>>>> "I'm also partial to using epoch time in the manifest because it is
>>>>> generally easier for programs to work with.  But, human-readable
>>>>> doesn't
>>>>> suck, either."
>>>>
>>>> Ugh.  If you go down that road, why write human-readable contents at
>>>> all?  You may as well just use a binary format.  But that's a very
>>>> slippery slope and you won't like to be in the bottom -- I don't see
>>>> what that gains you.  It's not like it's a lot of work to parse a
>>>> timestamp in a non-internationalized well-defined human-readable
>>>> format.
>>>
>>> Well, times are a special case because they are so easy to mess up.
>>> Try converting ISO-8601 to epoch time using the standard C functions
>>> on a system where TZ != UTC. Fun times.
>>
>> Even if it's a zulu time? That would be pretty damn sad.
> ZULU/GMT/UTC are all fine. But if the server timezone is EDT for
> example (not that I recommend this) you are likely to get the wrong
> result. Results vary based on your platform. For instance, we found
> MacOS was more likely to work the way you would expect and Linux was
> hopeless.
>
> There are all kinds of fun tricks to get around this (sort of). One is
> to temporarily set TZ=UTC which sucks if an error happens before it
> gets set back. There are some hacks to try to determine your offset
> which have inherent race conditions around DST changes.
>
> After some experimentation we just used the Posix definition for epoch
> time and used that to do our conversions:
>
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_16
>
>
>

OK, but I think if we're putting a timestamp string in ISO-8601 format
in the manifest it should be in UTC / Zulu time, precisely to avoid
these issues. If that's too much trouble then yes an epoch time will
probably do.


cheers


andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: documenting the backup manifest file format

From
David Steele
Date:
On 4/14/20 3:55 PM, Andrew Dunstan wrote:
> 
> OK, but I think if we're putting a timestamp string in ISO-8601 format
> in the manifest it should be in UTC / Zulu time, precisely to avoid
> these issues. If that's too much trouble then yes an epoch time will
> probably do.

Happily ISO-8601 is always UTC. The problem I'm referring to is the 
timezone setting on the host system when doing conversions in C.

To be fair most languages handle this well and C is C so I'm not sure we 
need to make a big deal of it. In JSON/XML it's pretty common to use 
ISO-8601 so that seems like a rational choice.

Regards,
-- 
-David
david@pgmasters.net



Re: documenting the backup manifest file format

From
Alvaro Herrera
Date:
On 2020-Apr-14, Andrew Dunstan wrote:

> OK, but I think if we're putting a timestamp string in ISO-8601 format
> in the manifest it should be in UTC / Zulu time, precisely to avoid
> these issues. If that's too much trouble then yes an epoch time will
> probably do.

The timestamp is always specified and always UTC (except the code calls
it GMT).

+   /*
+    * Convert last modification time to a string and append it to the
+    * manifest. Since it's not clear what time zone to use and since time
+    * zone definitions can change, possibly causing confusion, use GMT
+    * always.
+    */
+   appendStringInfoString(&buf, "\"Last-Modified\": \"");
+   enlargeStringInfo(&buf, 128);
+   buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%d %H:%M:%S %Z",
+                          pg_gmtime(&mtime));
+   appendStringInfoString(&buf, "\"");

I was merely saying that it's trivial to make this iso-8601 compliant as

    buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%dT%H:%M:%SZ",

ie. omit the "GMT" string and replace it with a literal Z, and remove
the space and replace it with a T.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: documenting the backup manifest file format

From
Alvaro Herrera
Date:
On 2020-Apr-14, David Steele wrote:

> Happily ISO-8601 is always UTC.

Uh, it is not --
https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: documenting the backup manifest file format

From
Andrew Dunstan
Date:
On 4/14/20 4:09 PM, Alvaro Herrera wrote:
> On 2020-Apr-14, Andrew Dunstan wrote:
>
>> OK, but I think if we're putting a timestamp string in ISO-8601 format
>> in the manifest it should be in UTC / Zulu time, precisely to avoid
>> these issues. If that's too much trouble then yes an epoch time will
>> probably do.
> The timestamp is always specified and always UTC (except the code calls
> it GMT).
>
> +   /*
> +    * Convert last modification time to a string and append it to the
> +    * manifest. Since it's not clear what time zone to use and since time
> +    * zone definitions can change, possibly causing confusion, use GMT
> +    * always.
> +    */
> +   appendStringInfoString(&buf, "\"Last-Modified\": \"");
> +   enlargeStringInfo(&buf, 128);
> +   buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%d %H:%M:%S %Z",
> +                          pg_gmtime(&mtime));
> +   appendStringInfoString(&buf, "\"");
>
> I was merely saying that it's trivial to make this iso-8601 compliant as
>
>     buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%dT%H:%M:%SZ",
>
> ie. omit the "GMT" string and replace it with a literal Z, and remove
> the space and replace it with a T.
>

+1


cheers


andre



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: documenting the backup manifest file format

From
David Steele
Date:
On 4/14/20 4:11 PM, Alvaro Herrera wrote:
> On 2020-Apr-14, David Steele wrote:
> 
>> Happily ISO-8601 is always UTC.
> 
> Uh, it is not --
> https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators

Whoops, you are correct. I've just never seen non-UTC in the wild yet.

-- 
-David
david@pgmasters.net



Re: backup manifests

From
Fujii Masao
Date:

On 2020/04/14 0:15, Robert Haas wrote:
> On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> I found other minor issues.
> 
> I think these are all correct fixes. Thanks for the post-commit
> review, and sorry for this mistakes.

Thanks for the review, Michael and Robert. Pushed the patches!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: documenting the backup manifest file format

From
Fujii Masao
Date:

On 2020/04/14 2:40, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
>> I don't like having a file format that's intended to be used by external
>> tools too that's undocumented except for code that assembles it in a
>> piecemeal fashion.  Do you mean in a follow-on patch this release, or
>> later? I don't have a problem with the former.
> 
> Here is a patch for that.

While reading the document that you pushed, I thought that it's better
to define index term for backup manifest, so that we can easily reach
this document from the index page. Thought? Patch attached.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: documenting the backup manifest file format

From
Robert Haas
Date:
On Tue, Apr 14, 2020 at 11:49 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> While reading the document that you pushed, I thought that it's better
> to define index term for backup manifest, so that we can easily reach
> this document from the index page. Thought? Patch attached.

Fine with me. I tend not to think about the index very much, so I'm
glad you are. :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: documenting the backup manifest file format

From
Jehan-Guillaume de Rorthais
Date:
On Tue, 14 Apr 2020 12:56:49 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, Apr 13, 2020 at 5:43 PM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> > Yeah, I guess I'm just saying that it feels brittle to have a file
> > format that's supposed to be good for data exchange and then make it
> > itself depend on representation details such as the order that fields
> > appear in, the letter case, or the format of newlines.  Maybe this isn't
> > really of concern, but it seemed strange.  
> 
> I didn't want to use JSON for this at all, but I got outvoted. When I
> raised this issue, it was suggested that I deal with it in this way,
> so I did. I can't really defend it too far beyond that, although I do
> think that one nice thing about this is that you can verify the
> checksum using shell commands if you want. Just figure out the number
> of lines in the file, minus one, and do head -n$LINES backup_manifest
> | shasum -a256 and boom. If there were some whitespace-skipping thing
> figuring out how to reproduce the checksum calculation would be hard.

FWIW, shell commands (md5sum and sha*sum) read checksums from a separate file
with a very simple format: one file per line with format "CHECKSUM FILEPATH".

Thanks to json, it is fairly easy to extract checksums and filenames from the
current manifest file format and check them all with one command:

  jq -r '.Files|.[]|.Checksum+" "+.Path' backup_manifest > checksums.sha256
  sha256sum --check --quiet checksums.sha256

You can even pipe these commands together to avoid the intermediary file.

But for backup_manifest, it's kind of shame we have to check the checksum
against an transformed version of the file. Did you consider creating eg. a
separate backup_manifest.sha256 file?

I'm very sorry in advance if this has been discussed previously.

Regards,



Re: documenting the backup manifest file format

From
Robert Haas
Date:
On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:
> But for backup_manifest, it's kind of shame we have to check the checksum
> against an transformed version of the file. Did you consider creating eg. a
> separate backup_manifest.sha256 file?
>
> I'm very sorry in advance if this has been discussed previously.

It was briefly mentioned in the original (lengthy) discussion, but I
think there was one vote in favor and two votes against or something
like that, so it didn't go anywhere. I didn't realize that there were
handy command-line tools for manipulating json like that, or I
probably would have considered that idea more strongly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: documenting the backup manifest file format

From
Jehan-Guillaume de Rorthais
Date:
On Wed, 15 Apr 2020 12:03:28 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
> <jgdr@dalibo.com> wrote:
> > But for backup_manifest, it's kind of shame we have to check the checksum
> > against an transformed version of the file. Did you consider creating eg. a
> > separate backup_manifest.sha256 file?
> >
> > I'm very sorry in advance if this has been discussed previously.  
> 
> It was briefly mentioned in the original (lengthy) discussion, but I
> think there was one vote in favor and two votes against or something
> like that, so it didn't go anywhere.

Argh.

> I didn't realize that there were handy command-line tools for manipulating
> json like that, or I probably would have considered that idea more strongly.

That was indeed a lengthy thread with various details discussed. I'm sorry I
didn't catch the ball back then.

Regards,



Re: documenting the backup manifest file format

From
David Steele
Date:
On 4/15/20 6:43 PM, Jehan-Guillaume de Rorthais wrote:
> On Wed, 15 Apr 2020 12:03:28 -0400
> Robert Haas <robertmhaas@gmail.com> wrote:
> 
>> On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
>> <jgdr@dalibo.com> wrote:
>>> But for backup_manifest, it's kind of shame we have to check the checksum
>>> against an transformed version of the file. Did you consider creating eg. a
>>> separate backup_manifest.sha256 file?
>>>
>>> I'm very sorry in advance if this has been discussed previously.
>>
>> It was briefly mentioned in the original (lengthy) discussion, but I
>> think there was one vote in favor and two votes against or something
>> like that, so it didn't go anywhere.
> 
> Argh.
> 
>> I didn't realize that there were handy command-line tools for manipulating
>> json like that, or I probably would have considered that idea more strongly.
> 
> That was indeed a lengthy thread with various details discussed. I'm sorry I
> didn't catch the ball back then.

One of the reasons to use JSON was to be able to use command line tools 
like jq to do tasks (I use it myself). But I think only the 
pg_verifybackup tool should be used to verify the internal checksum.

Two thoughts:

1) You can always generate an external checksum when you generate the 
backup if you want to do your own verification without running 
pg_verifybackup.

2) Perhaps it would be good if the pg_verifybackup command had a 
--verify-manifest-checksum option (or something) to check that the 
manifest file looks valid without checking any files. That's not going 
to happen for PG13, but it's possible for PG14.

Regards,
-- 
-David
david@pgmasters.net



Re: documenting the backup manifest file format

From
Jehan-Guillaume de Rorthais
Date:
On Wed, 15 Apr 2020 18:54:14 -0400
David Steele <david@pgmasters.net> wrote:

> On 4/15/20 6:43 PM, Jehan-Guillaume de Rorthais wrote:
> > On Wed, 15 Apr 2020 12:03:28 -0400
> > Robert Haas <robertmhaas@gmail.com> wrote:
> >   
> >> On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
> >> <jgdr@dalibo.com> wrote:  
> >>> But for backup_manifest, it's kind of shame we have to check the checksum
> >>> against an transformed version of the file. Did you consider creating eg.
> >>> a separate backup_manifest.sha256 file?
> >>>
> >>> I'm very sorry in advance if this has been discussed previously.  
> >>
> >> It was briefly mentioned in the original (lengthy) discussion, but I
> >> think there was one vote in favor and two votes against or something
> >> like that, so it didn't go anywhere.  
> > 
> > Argh.
> >   
> >> I didn't realize that there were handy command-line tools for manipulating
> >> json like that, or I probably would have considered that idea more
> >> strongly.  
> > 
> > That was indeed a lengthy thread with various details discussed. I'm sorry I
> > didn't catch the ball back then.  
> 
> One of the reasons to use JSON was to be able to use command line tools 
> like jq to do tasks (I use it myself).

That's perfectly fine. I was only wondering about having the manifest checksum
outside of the manifest itself.

> But I think only the pg_verifybackup tool should be used to verify the
> internal checksum.

true.

> Two thoughts:
> 
> 1) You can always generate an external checksum when you generate the 
> backup if you want to do your own verification without running 
> pg_verifybackup.

Sure, but by the time I want to produce an external checksum, the manifest
would have travel around quite a bit with various danger on its way to corrupt
it. Checksuming it from the original process that produced it sounds safer.

> 2) Perhaps it would be good if the pg_verifybackup command had a 
> --verify-manifest-checksum option (or something) to check that the 
> manifest file looks valid without checking any files. That's not going 
> to happen for PG13, but it's possible for PG14.

Sure.

I just liked the idea to be able to check the manifest using an external
command line implementing the same standardized checksum algo. Without editing
the manifest first. But I understand it's too late to discuss this now.

Regards,



Re: documenting the backup manifest file format

From
Fujii Masao
Date:

On 2020/04/15 22:24, Robert Haas wrote:
> On Tue, Apr 14, 2020 at 11:49 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> While reading the document that you pushed, I thought that it's better
>> to define index term for backup manifest, so that we can easily reach
>> this document from the index page. Thought? Patch attached.
> 
> Fine with me. I tend not to think about the index very much, so I'm
> glad you are. :-)

Pushed! Thanks!

Regards,
  

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: backup manifests

From
Fujii Masao
Date:

On 2020/04/15 11:18, Fujii Masao wrote:
> 
> 
> On 2020/04/14 0:15, Robert Haas wrote:
>> On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>> I found other minor issues.
>>
>> I think these are all correct fixes. Thanks for the post-commit
>> review, and sorry for this mistakes.
> 
> Thanks for the review, Michael and Robert. Pushed the patches!

I found three minor issues in pg_verifybackup.

+        {"print-parse-wal", no_argument, NULL, 'p'},

This is unused option, so this line should be removed.

+    printf(_("  -m, --manifest=PATH         use specified path for manifest\n"));

Typo: --manifest should be --manifest-path

pg_verifybackup accepts --quiet option, but its usage() doesn't
print any message for --quiet option.

Attached is the patch that fixes those issues.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: backup manifests

From
Robert Haas
Date:
On Wed, Apr 22, 2020 at 12:21 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> I found three minor issues in pg_verifybackup.
>
> +               {"print-parse-wal", no_argument, NULL, 'p'},
>
> This is unused option, so this line should be removed.
>
> +       printf(_("  -m, --manifest=PATH         use specified path for manifest\n"));
>
> Typo: --manifest should be --manifest-path
>
> pg_verifybackup accepts --quiet option, but its usage() doesn't
> print any message for --quiet option.
>
> Attached is the patch that fixes those issues.

Thanks; LGTM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: backup manifests

From
Fujii Masao
Date:

On 2020/04/23 1:28, Robert Haas wrote:
> On Wed, Apr 22, 2020 at 12:21 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> I found three minor issues in pg_verifybackup.
>>
>> +               {"print-parse-wal", no_argument, NULL, 'p'},
>>
>> This is unused option, so this line should be removed.
>>
>> +       printf(_("  -m, --manifest=PATH         use specified path for manifest\n"));
>>
>> Typo: --manifest should be --manifest-path
>>
>> pg_verifybackup accepts --quiet option, but its usage() doesn't
>> print any message for --quiet option.
>>
>> Attached is the patch that fixes those issues.
> 
> Thanks; LGTM.

Thanks for the review! Pushed.

Regards,  

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: backup manifests

From
Robert Haas
Date:
On Sun, Apr 5, 2020 at 3:31 PM Andres Freund <andres@anarazel.de> wrote:
> The warnings don't seem too unreasonable. The compiler can't see that
> the error_cb inside json_manifest_parse_failure() is not expected to
> return. Probably worth adding a wrapper around the calls to
> context->error_cb and mark that as noreturn.

Eh, how? The callback is declared as:

typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
                                                                 char
*fmt, ...) pg_attribute_printf(2, 3);

I don't know of a way to create a wrapper around that, because of the
variable argument list. We could change the callback to take va_list,
I guess.

Does it work for you to just add pg_attribute_noreturn() to this
typedef, as in the attached?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: backup manifests

From
Andres Freund
Date:
Hi,

On 2020-04-23 08:57:39 -0400, Robert Haas wrote:
> On Sun, Apr 5, 2020 at 3:31 PM Andres Freund <andres@anarazel.de> wrote:
> > The warnings don't seem too unreasonable. The compiler can't see that
> > the error_cb inside json_manifest_parse_failure() is not expected to
> > return. Probably worth adding a wrapper around the calls to
> > context->error_cb and mark that as noreturn.
> 
> Eh, how? The callback is declared as:
> 
> typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
>                                                                  char
> *fmt, ...) pg_attribute_printf(2, 3);
> 
> I don't know of a way to create a wrapper around that, because of the
> variable argument list.

Didn't think that far...


> We could change the callback to take va_list, I guess.

I'd argue that that'd be a good idea anyway, otherwise there's no way to
wrap the invocation anywhere in the code. But that's an independent
consideration, as:

> Does it work for you to just add pg_attribute_noreturn() to this
> typedef, as in the attached?

does fix the problem for me, cool.

Do you not see a warning when compiling with optimizations enabled?

Greetings,

Andres Freund



Re: backup manifests

From
Robert Haas
Date:
On Thu, Apr 23, 2020 at 5:16 PM Andres Freund <andres@anarazel.de> wrote:
> Do you not see a warning when compiling with optimizations enabled?

No, I don't. I tried it with -O{0,1,2,3} and I always use -Wall
-Werror. No warnings.

[rhaas pgsql]$ clang -v
clang version 5.0.2 (tags/RELEASE_502/final)
Target: x86_64-apple-darwin19.4.0
Thread model: posix
InstalledDir: /opt/local/libexec/llvm-5.0/bin

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: documenting the backup manifest file format

From
Fujii Masao
Date:

On 2020/04/15 5:33, Andrew Dunstan wrote:
> 
> On 4/14/20 4:09 PM, Alvaro Herrera wrote:
>> On 2020-Apr-14, Andrew Dunstan wrote:
>>
>>> OK, but I think if we're putting a timestamp string in ISO-8601 format
>>> in the manifest it should be in UTC / Zulu time, precisely to avoid
>>> these issues. If that's too much trouble then yes an epoch time will
>>> probably do.
>> The timestamp is always specified and always UTC (except the code calls
>> it GMT).
>>
>> +   /*
>> +    * Convert last modification time to a string and append it to the
>> +    * manifest. Since it's not clear what time zone to use and since time
>> +    * zone definitions can change, possibly causing confusion, use GMT
>> +    * always.
>> +    */
>> +   appendStringInfoString(&buf, "\"Last-Modified\": \"");
>> +   enlargeStringInfo(&buf, 128);
>> +   buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%d %H:%M:%S %Z",
>> +                          pg_gmtime(&mtime));
>> +   appendStringInfoString(&buf, "\"");
>>
>> I was merely saying that it's trivial to make this iso-8601 compliant as
>>
>>      buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%dT%H:%M:%SZ",
>>
>> ie. omit the "GMT" string and replace it with a literal Z, and remove
>> the space and replace it with a T.

I have one question related to this; Why don't we use log_timezone,
like backup_label? log_timezone is used for "START TIME" field in
backup_label. Sorry if this was already discussed.

        /* Use the log timezone here, not the session timezone */
        stamp_time = (pg_time_t) time(NULL);
        pg_strftime(strfbuf, sizeof(strfbuf),
                    "%Y-%m-%d %H:%M:%S %Z",
                    pg_localtime(&stamp_time, log_timezone));

OTOH, *if* we want to use the same timezone for backup-related files because
backup can be used in different environements and timezone setting
may be different there or for other reasons, backup_label also should use
GMT or something for the sake of consistency?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: documenting the backup manifest file format

From
Robert Haas
Date:
On Fri, May 15, 2020 at 2:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> I have one question related to this; Why don't we use log_timezone,
> like backup_label? log_timezone is used for "START TIME" field in
> backup_label. Sorry if this was already discussed.
>
>                 /* Use the log timezone here, not the session timezone */
>                 stamp_time = (pg_time_t) time(NULL);
>                 pg_strftime(strfbuf, sizeof(strfbuf),
>                                         "%Y-%m-%d %H:%M:%S %Z",
>                                         pg_localtime(&stamp_time, log_timezone));
>
> OTOH, *if* we want to use the same timezone for backup-related files because
> backup can be used in different environements and timezone setting
> may be different there or for other reasons, backup_label also should use
> GMT or something for the sake of consistency?

It's a good question. My inclination was to think that GMT would be
the clearest thing, but I also didn't realize that the result would
thus be inconsistent with backup_label. Not sure what's best here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: documenting the backup manifest file format

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> It's a good question. My inclination was to think that GMT would be
> the clearest thing, but I also didn't realize that the result would
> thus be inconsistent with backup_label. Not sure what's best here.

I vote for following the backup_label precedent; that's stood for quite
some years now.

            regards, tom lane



Re: documenting the backup manifest file format

From
David Steele
Date:
On 5/15/20 9:34 AM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> It's a good question. My inclination was to think that GMT would be
>> the clearest thing, but I also didn't realize that the result would
>> thus be inconsistent with backup_label. Not sure what's best here.
> 
> I vote for following the backup_label precedent; that's stood for quite
> some years now.

I'd rather keep it GMT. The timestamps in the backup label are purely 
informational, but the timestamps in the manifest are useful, e.g. to 
set the mtime on a restore to the original value.

Forcing the user to do timezone conversions is prone to error. Some 
languages, like C, simply aren't good at it.

Of course, my actual preference is to use epoch time which is easy to 
work with and eliminates the possibility of conversion errors. It is 
also compact.

Regards,
-- 
-David
david@pgmasters.net



Re: documenting the backup manifest file format

From
Tom Lane
Date:
David Steele <david@pgmasters.net> writes:
> On 5/15/20 9:34 AM, Tom Lane wrote:
>> I vote for following the backup_label precedent; that's stood for quite
>> some years now.

> Of course, my actual preference is to use epoch time which is easy to 
> work with and eliminates the possibility of conversion errors. It is 
> also compact.

Well, if we did that then it'd be sufficiently different from the backup
label as to remove any risk of confusion.  But "easy to work with" is in
the eye of the beholder; do we really want a format that's basically
unreadable to the naked eye?

            regards, tom lane



Re: documenting the backup manifest file format

From
David Steele
Date:
On 5/15/20 10:17 AM, Tom Lane wrote:
> David Steele <david@pgmasters.net> writes:
>> On 5/15/20 9:34 AM, Tom Lane wrote:
>>> I vote for following the backup_label precedent; that's stood for quite
>>> some years now.
> 
>> Of course, my actual preference is to use epoch time which is easy to
>> work with and eliminates the possibility of conversion errors. It is
>> also compact.
> 
> Well, if we did that then it'd be sufficiently different from the backup
> label as to remove any risk of confusion.  But "easy to work with" is in
> the eye of the beholder; do we really want a format that's basically
> unreadable to the naked eye?

Well, I lost this argument before so it seems I'm in the minority on 
easy-to-use. We use epoch time in the pgBackRest manifests which has 
been easy to deal with in both C and Perl, so experience tells me it 
really is easy, at least for programs.

The manifest (to me, at least) is generally intended to be 
machine-processed. For instance, it contains checksums which are not all 
that useful unless they are checked programmatically -- they can't just 
be eye-balled.

Regards,
-- 
-David
david@pgmasters.net