Re: Add system identifier to backup manifest - Mailing list pgsql-hackers

From Amul Sul
Subject Re: Add system identifier to backup manifest
Date
Msg-id CAAJ_b9417seAdhwhMVr=PnQ+MPx7uAeS5FAw2qMCXF5MrE-7wQ@mail.gmail.com
Whole thread Raw
In response to Re: Add system identifier to backup manifest  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Add system identifier to backup manifest
List pgsql-hackers
On Wed, Jan 24, 2024 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 22, 2024 at 2:22 AM Amul Sul <sulamul@gmail.com> wrote:
> Thinking a bit more on this, I realized parse_manifest_file() has many out
> parameters. Instead parse_manifest_file() should simply return manifest data
> like load_backup_manifest().  Attached 0001 patch doing the same, and removed
> parser_context structure, and added manifest_data, and did the required
> adjustments to pg_verifybackup code.

        InitializeBackupManifest(&manifest, opt->manifest,
-
opt->manifest_checksum_type);
+
opt->manifest_checksum_type,
+                                                        GetSystemIdentifier());

InitializeBackupManifest() can just call GetSystemIdentifier() itself,
instead of passing another parameter, I think.

Ok.
 

+       if (manifest_version == 1)
+               context->error_cb(context,
+                                                 "%s: backup manifest
doesn't support incremental changes",
+
private_context->backup_directory);

I think this is weird. First, there doesn't seem to be any reason to
bounce through error_cb() here. You could just call pg_fatal(), as we
do elsewhere in this file. Second, there doesn't seem to be any need
to include the backup directory in this error message. We include the
file name (not the directory name) in errors that pertain to the file
itself, like if we can't open or read it. But we don't do that for
semantic errors about the manifest contents (cf.
combinebackup_per_file_cb). This file would need a lot fewer charges
if you didn't feed the backup directory name through here. Third, the
error message is not well-chosen, because a user who looks at it won't
understand WHY the manifest doesn't support incremental changes. I
suggest "backup manifest version 1 does not support incremental
backup".

+       /* Incremental backups supported on manifest version 2 or later */
+       if (manifest_version == 1)
+               context->error_cb(context,
+                                                 "incremental backups
cannot be taken for this backup");

Let's use the same error message as in the previous case here also.

Ok.


+       for (i = 0; i < n_backups; i++)
+       {
+               if (manifests[i]->system_identifier != system_identifier)
+               {
+                       char       *controlpath;
+
+                       controlpath = psprintf("%s/%s",
prior_backup_dirs[i], "global/pg_control");
+
+                       pg_fatal("manifest is from different database
system: manifest database system identifier is %llu, %s system
identifier is %llu",
+                                        (unsigned long long)
manifests[i]->system_identifier,
+                                        controlpath,
+                                        (unsigned long long)
system_identifier);
+               }
+       }

check_control_files() already verifies that all of the control files
contain the same system identifier as each other, so what we're really
checking here is that the backup manifest in each directory has the
same system identifier as the control file in that same directory. One
problem is that backup manifests are optional here, as per the comment
in load_backup_manifests(), so you need to skip over NULL entries
cleanly to avoid seg faulting if one is missing. I also think the
error message should be changed. How about "%s: manifest system
identifier is %llu, but control file has %llu"?

Ok.


+               context->error_cb(context,
+                                                 "manifest is from
different database system: manifest database system identifier is
%llu, pg_control database system identifier is %llu",
+                                                 (unsigned long long)
manifest_system_identifier,
+                                                 (unsigned long long)
system_identifier);

And here, while I'm kibitzing, how about "manifest system identifier
is %llu, but this system's identifier is %llu"?

I used "database system identifier" instead of "this system's identifier " like
we are using in WalReceiverMain() and libpqrcv_identify_system().


-       qr/could not open directory/,
+       qr/could not open file/,

I don't think that the expected error message here should be changing.
Does it really, with the latest patch version? Why? Can we fix that?
 
Because, we were trying to access pg_control to check the system identifier
before any other backup directory/file validation.


+                       else if (!parse->saw_system_identifier_field &&
+
strcmp(parse->manifest_version, "1") != 0)

I don't think this has any business testing the manifest version.
That's something to sort out at some later stage.

That is for backward compatibility, otherwise, we would have an "expected
system identifier" error for manifest version 1.

Thank you for the review-comments, updated version attached.

Regards,
Amul
Attachment

pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: A compiling warning in jsonb_populate_record_valid
Next
From: Bertrand Drouvot
Date:
Subject: Re: Synchronizing slots from primary to standby