Re: Return pg_control from pg_backup_stop(). - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Return pg_control from pg_backup_stop().
Date
Msg-id aboDQSoywyKGGNCr@paquier.xyz
Whole thread Raw
In response to Re: Return pg_control from pg_backup_stop().  (Haibo Yan <tristan.yim@gmail.com>)
List pgsql-hackers
On Tue, Mar 17, 2026 at 11:50:29AM -0700, Haibo Yan wrote:
> Thank you for the clarification. I have now read the code, and
> overall it looks good to me. I only had one very small comment.

(Bottom-posting note from above, please be careful.)

> I was just wondering whether the following might be slightly clearer:
> ```
> memset(controlFile, 0, PG_CONTROL_FILE_SIZE);
> memcpy(controlFile, ControlFile, sizeof(ControlFileData));
> ```
>
> I do not think this is a real issue, though.

         {
             ControlFile->backupStartPoint = checkPoint.redo;
             ControlFile->backupEndRequired = backupEndRequired;
+            ControlFile->backupLabelRequired = false;

It sounds like it is going to be important to document the reason why
the flag is reset here (aka we don't need the backup_label file
anymore).

+backup_control_file(uint8_t *controlFile)
+{
+    ControlFileData *controlData = ((ControlFileData *)controlFile);
+
+    memset(controlFile + sizeof(ControlFileData), 0,
+           PG_CONTROL_FILE_SIZE - sizeof(ControlFileData));
+
+    LWLockAcquire(ControlFileLock, LW_SHARED);
+    memcpy(controlFile, ControlFile, sizeof(ControlFileData));
+    LWLockRelease(ControlFileLock);
+
+    controlData->backupLabelRequired = true;
+
+    INIT_CRC32C(controlData->crc);
+    COMP_CRC32C(controlData->crc, controlFile, offsetof(ControlFileData, crc));
+    FIN_CRC32C(controlData->crc);

I was wondering if we should have an assertion at least to cross-check
that the contents we store in shared memory never go out-of-sync with
the on-disk contents, in the shape of a USE_ASSERT_CHECKING block that
calls get_controlfile() and memcmp()'s the contents between shmem and
the on-disk file, while the LWLock is taken.  We ship the control file
last on purpose, one reason being backups taken from standbys, so that
may be sensible to do.

Another property of the new control file flag that is implied in the
implementation but not documented is that we should never check for
backupLabelRequired when a backup_label is gone.  Actually, the flag
is reset in InitWalRecovery() in the initial steps of recovery, and
the backup_label file is removed much later in StartupXLOG() just
*after* UpdateControlFile() to minimize the window where the contents
of the control file and the backup_label file is removed are
out-of-sync.  This window means that if we crash between the
completion of UpdateControlFile() and the durable_rename() we could
have a flag reset with a backup_label still around.  On restart,
recovery would fail, requiring a manual modification of the control
file, at least.  It sounds to me that this implementation detail
should be documented clearly.

Finally, here is a general opinion.  I like this patch, and it is
basically risk-free for base backups taken with the replication
protocol as we update the control file with the new flag set
on-the-fly.

Now, I am worried about backups that use pg_stop_backup().  Changing
backup APIs has always been a very sensitive area, and this is going
to require operators to update backup tools so as the control file
received as a result of pg_stop_backup() is copied, at the cost of
getting a failure if they don't do so.  I will *not* proceed with this
change without a clear approval from some more committers or senior
hackers that they like this change (approach previously suggested by
Andres, actually, for what I can see).  I am adding in CC a few
committers who have commented on this set of proposals and who have
touched the recovery code in the last few years, for awareness.
The timing is what it is, and we are at the end of a release cycle.
Let's see if we can reach a consensus of some kind.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: DOCS: typo on CLUSTER page
Next
From: Xuneng Zhou
Date:
Subject: Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery