Re: pg_verifybackup: TAR format backup verification - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_verifybackup: TAR format backup verification
Date
Msg-id CA+Tgmobs-gbRUMGPpzVZnPPP=cgeEO7hnzV2yKmM0BQ=MGQSMg@mail.gmail.com
Whole thread Raw
In response to Re: pg_verifybackup: TAR format backup verification  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
+                       pg_log_error("pg_waldump cannot read from a tar");

"tar" isn't normally used as a noun as you do here, so I think this
should say "pg_waldump cannot read tar files".

Technically, the position of this check could lead to an unnecessary
failure, if -n wasn't specified but pg_wal.tar(.whatever) also doesn't
exist on disk. But I think it's OK to ignore that case.

However, I also notice this change to the TAP tests in a few places:

-       [ 'pg_verifybackup', $tempdir ],
+       [ 'pg_verifybackup', '-Fp', $tempdir ],

It's not the end of the world to have to make a change like this, but
it seems easy to do better. Just postpone the call to
find_backup_format() until right after we call parse_manifest_file().
That also means postponing the check mentioned above until right after
that, but that's fine: after parse_manifest_file() and then
find_backup_format(), you can do if (!no_parse_wal && context.format
== 't') { bail out }.

+       if (stat(path, &sb) == 0)
+               result = 'p';
+       else
+       {
+               if (errno != ENOENT)
+               {
+                       pg_log_error("cannot determine backup format : %m");
+                       pg_log_error_hint("Try \"%s --help\" for more
information.", progname);
+                       exit(1);
+               }
+
+               /* Otherwise, it is assumed to be a tar format backup. */
+               result = 't';
+       }

This doesn't look good, for a few reasons:

1. It would be clearer to structure this as if (stat(...) == 0) result
= 'p'; else if (errno == ENOENT) result = 't'; else { report an error;
} instead of the way you have it.

2. "cannot determine backup format" is not an appropriate way to
report the failure of stat(). The appropriate message is "could not
stat file \"%s\"".

3. It is almost never correct to put a space before a colon in an error message.

4. The hint doesn't look helpful, or necessary. I think you can just
delete that.

Regarding both point #2 and point #4, I think we should ask ourselves
how stat() could realistically fail here. On my system (macOS), the
document failed modes for stat() are EACCES (i.e. permission denied),
EFAULT (i.e. we have a bug in pg_verifybackup), EIO (I/O Error), ELOOP
(symlink loop), ENAMETOOLONG, ENOENT, ENOTDIR, and EOVERFLOW. In none
of those cases does it seem likely that specifying the format manually
will help anything. Thus, suggesting that the user look at the help,
presumably to find --format, is unlikely to solve anything, and
telling them that the error happened while trying to determine the
backup format isn't really helping anything, either. What the user
needs to know is that it was stat() that failed, and the pathname for
which it failed. Then they need to sort out whatever problem is
causing them to get one of those really weird errors.

Aside from the above, 0010 looks pretty reasonable, although I'll
probably want to do some wordsmithing on some of the comments at some
point.

-   of the backup.  The backup must be stored in the "plain"
-   format; a "tar" format backup can be checked after extracting it.
+   of the backup.  The backup must be stored in the "plain" or "tar"
+   format.  Verification is supported for <literal>gzip</literal>,
+   <literal>lz4</literal>, and  <literal>zstd</literal> compressed tar backup;
+   any other compressed format backups can be checked after decompressing them.

I don't think that we need to say that the backup must be stored in
the plain or tar format, because those are the only backup formats
pg_basebackup knows about. Similarly, it doesn't seem help to me to
enumerate all the compression algorithms that pg_basebackup supports
and say we only support those; what else would a user expect?

What I would do is replace the original sentence ("The backup must be
stored...") with: The backup may be stored either in the "plain" or
the "tar" format; this includes "tar" backups compressed with any
algorithm supported by pg_basebackup. However, at present, WAL
verification is supported only for plain-format backups. Therefore, if
the backup is stored in "tar" format, the <literal>-n,
--no-parse-wal<literal> option should be used.

+               # Add tar backup format option
+               push @backup, ('-F', 't');
+               # Add switch to skip WAL verification.
+               push @verify, ('-n');

Say why, not what. The second comment should say something like "WAL
verification not yet supported for tar-format backups".

+                       "$format backup fails with algorithm \"$algorithm\"");
+       $primary->command_ok(\@backup, "$format backup ok with
algorithm \"$algorithm\"");
+               ok(-f "$backup_path/backup_manifest", "$format backup
manifest exists");
+               "verify $format backup with algorithm \"$algorithm\"");

Personally I would change "$format" to "$format format" in all of
these places, so that we talk about a "tar format backup" or a "plain
format backup" instead of a "tar backup" or a "plain backup".

+               'skip_on_windows' => 1

I don't understand why 4 of the 7 new tests are skipped on Windows.
The existing "skip" message for this parameter says "unix-style
permissions not supported on Windows" but that doesn't seem applicable
for any of the new cases, and I couldn't find a comment about it,
either.

+       my @files = glob("*");
+       system_or_bail($tar, '-cf', "$backup_path/$archive", @files);

Why not just system_or_bail($tar, '-cf', "$backup_path/$archive", '.')?

Also, instead of having separate entries in the test array to do
basically the same thing on Windows, could we just iterate through the
test array twice and do everything once for plain format and then a
second time for tar format, and do the tests once for each? I don't
think that idea QUITE works, because the open_file_fails,
open_directory_fails, and search_directory_fails tests are really not
applicable to tar format. But we could rename skip_on_windows to
tests_file_permissions and skip those both on Windows and for tar
format. But aside from that, I don't quite see why it makes sense to,
for example, test extra_file for both formats but not
extra_tablespace_file, and indeed it seems like an important bit of
test coverage.

I also feel like we should have tests someplace that add extra files
to a tar-format backup in the backup directory (e.g. 1234567.tar, or
wug.tar, or 123456.wug) or remove entire files.

...Robert



pgsql-hackers by date:

Previous
From: Mats Kindahl
Date:
Subject: Re: Use streaming read API in ANALYZE
Next
From: Noah Misch
Date:
Subject: Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible