Thread: Add recovery to pg_control and remove backup_label

Add recovery to pg_control and remove backup_label

From

David Steele

Date:

26 October 2023, 21:02:20

Hackers,

This was originally proposed in [1] but that thread went through a 
number of different proposals so it seems better to start anew.

The basic idea here is to simplify and harden recovery by getting rid of 
backup_label and storing recovery information directly in pg_control. 
Instead of backup software copying pg_control from PGDATA, it stores an 
updated version that is returned from pg_backup_stop(). I believe this 
is better for the following reasons:

* The user can no longer remove backup_label and get what looks like a 
successful restore (while almost certainly causing corruption). If 
pg_control is removed the cluster will not start. The user may try 
pg_resetwal, but I think that tool makes it pretty clear that corruption 
will result from its use. We could also modify pg_resetwal to complain 
if recovery info is present in pg_control.

* We don't need to worry about backup software seeing a torn copy of 
pg_control, since Postgres can safely read it out of memory and provide 
a valid copy via pg_backup_stop(). This solves [2] without needing to 
write pg_control via a temp file, which may affect performance on a 
standby. Unfortunately, this solution cannot be back patched.

* For backup from standby, we no longer need to instruct the backup 
software to copy pg_control last. In fact the backup software should not 
copy pg_control from PGDATA at all.

Since backup_label is now gone, the fields that used to be in 
backup_label are now provided as columns returned from pg_backup_start() 
and pg_backup_stop() and the backup history file is still written to the 
archive. For pg_basebackup we would have the option of writing the 
fields into the JSON manifest, storing them to a file (e.g. 
backup.info), or just ignoring them. None of the fields are required for 
recovery but backup software may be very interested in them.

I updated pg_rewind but I'm not very confident in the tests. When I 
removed backup_label processing, but before I updated pg_rewind to write 
recovery info into pg_control, all the rewind tests passed.

This patch highlights the fact that we still have no tests for the 
low-level backup method. I modified pgBackRest to work with this patch 
and the entire test suite ran without any issues, but in-core tests 
would be good to have. I'm planning to work on those myself as a 
separate patch.

This patch would also make the proposal in [3] obsolete since there is 
no need to rename backup_label if it is gone.

I know that outputting pg_control as bytea is going to be a bit 
controversial. Software that is using psql get run pg_backup_stop() 
could use encode() to get pg_control as text and then decode it later. 
Alternately, we could update ReadControlFile() to recognize a 
base64-encoded pg_control file. I'm not sure dealing with binary data is 
that much of a problem, though, and if the backup software gets it wrong 
then recovery with fail on an invalid pg_control file.

Lastly, I think there are improvements to be made in recovery that go 
beyond this patch. I originally set out to load the recovery info into 
*just* the existing fields in pg_control but it required so many changes 
to recovery that I decided it was too dangerous to do all in one patch. 
This patch very much takes the "backup_label in pg_control" approach, 
though I reused fields where possible. The added fields, e.g. 
backupRecoveryRequested, also allow us to keep the user experience 
pretty much the same in terms of messages and errors.

Thoughts?

Regards,
-David

[1] 
https://postgresql.org/message-id/1330cb48-4e47-03ca-f2fb-b144b49514d8%40pgmasters.net
[2] 
https://postgresql.org/message-id/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
[3] 
https://postgresql.org/message-id/eb3d1aae-1a75-bcd3-692a-38729423168f%40pgmasters.net

Attachment

v01-recovery-in-pgcontrol-remove-backuplabel.patch

Re: Add recovery to pg_control and remove backup_label

From

"David G. Johnston"

Date:

26 October 2023, 21:27:52

On Thu, Oct 26, 2023 at 2:02 PM David Steele <david@pgmasters.net> wrote:

Hackers,

This was originally proposed in [1] but that thread went through a
number of different proposals so it seems better to start anew.

The basic idea here is to simplify and harden recovery by getting rid of
backup_label and storing recovery information directly in pg_control.
Instead of backup software copying pg_control from PGDATA, it stores an
updated version that is returned from pg_backup_stop(). I believe this
is better for the following reasons:

* The user can no longer remove backup_label and get what looks like a
successful restore (while almost certainly causing corruption). If
pg_control is removed the cluster will not start. The user may try
pg_resetwal, but I think that tool makes it pretty clear that corruption
will result from its use. We could also modify pg_resetwal to complain
if recovery info is present in pg_control.

* We don't need to worry about backup software seeing a torn copy of
pg_control, since Postgres can safely read it out of memory and provide
a valid copy via pg_backup_stop(). This solves [2] without needing to
write pg_control via a temp file, which may affect performance on a
standby. Unfortunately, this solution cannot be back patched.

Are we planning on dealing with torn writes in the back branches in some way or are we just throwing in the towel and saying the old method is too error-prone to exist/retain and therefore the goal of the v17 changes is to not only provide a better way but also to ensure the old way no longer works? It seems sufficient to change the output signature of pg_backup_stop to accomplish that goal though I am pondering whether an explicit check and error for seeing the backup_label file would be warranted.

If we are going to solve the torn writes problem completely then while I agree the new way is superior, implementing it doesn't have to mean existing tools built to produce backup_label and rely upon the pg_control in the data directory need to be forcibly broken.

I know that outputting pg_control as bytea is going to be a bit
controversial. Software that is using psql get run pg_backup_stop()
could use encode() to get pg_control as text and then decode it later.
Alternately, we could update ReadControlFile() to recognize a
base64-encoded pg_control file. I'm not sure dealing with binary data is
that much of a problem, though, and if the backup software gets it wrong
then recovery with fail on an invalid pg_control file.

Can we not figure out some way to place the relevant files onto the server somewhere so that a simple "cp" command would work? Have pg_backup_stop return paths instead of contents, those paths being "$TEMP_DIR"/<random unique new directory>/pg_control.conf (and tablespace_map)

David J.

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

27 October 2023, 14:10:42

On 10/26/23 17:27, David G. Johnston wrote:
> On Thu, Oct 26, 2023 at 2:02 PM David Steele <david@pgmasters.net 
> <mailto:david@pgmasters.net>> wrote:
> 
> Are we planning on dealing with torn writes in the back branches in some 
> way or are we just throwing in the towel and saying the old method is 
> too error-prone to exist/retain 

We are still planning to address this issue in the back branches.

> and therefore the goal of the v17 
> changes is to not only provide a better way but also to ensure the old 
> way no longer works?  It seems sufficient to change the output signature 
> of pg_backup_stop to accomplish that goal though I am pondering whether 
> an explicit check and error for seeing the backup_label file would be 
> warranted.

Well, if the backup tool is just copying the second column of output to 
the backup_label, then it won't break. Of course in that case, restores 
won't work correctly but you would not get an error. Testing would show 
that it is not working properly and backup tools should certainly be tested.

Even so, I'm OK with an explicit check for backup_label. Let's see what 
others think.

> If we are going to solve the torn writes problem completely then while I 
> agree the new way is superior, implementing it doesn't have to mean 
> existing tools built to produce backup_label and rely upon the 
> pg_control in the data directory need to be forcibly broken.

It is a pretty easy update to any backup software that supports 
non-exclusive backup. I was able to make the changes to pgBackRest in 
less than an hour. We've made major changes to backup and restore in 
almost every major version of PostgreSQL for a while: non-exlusive 
backup in 9.6, dir renames in 10, variable WAL size in 11, new recovery 
location in 12, hard recovery target errors in 13, and changes to 
non-exclusive backup and removal of exclusive backup in 15. In 17 we are 
already looking at new page and segment sizes.

>     I know that outputting pg_control as bytea is going to be a bit
>     controversial. Software that is using psql get run pg_backup_stop()
>     could use encode() to get pg_control as text and then decode it later.
>     Alternately, we could update ReadControlFile() to recognize a
>     base64-encoded pg_control file. I'm not sure dealing with binary
>     data is
>     that much of a problem, though, and if the backup software gets it
>     wrong
>     then recovery with fail on an invalid pg_control file.
> 
> Can we not figure out some way to place the relevant files onto the 
> server somewhere so that a simple "cp" command would work?  Have 
> pg_backup_stop return paths instead of contents, those paths being 
> "$TEMP_DIR"/<random unique new directory>/pg_control.conf (and 
> tablespace_map)

Nobody has been able to figure this out, and some of us have been 
thinking about it for years. It just doesn't seem possible to reliably 
tell the difference between a cluster that was copied and one that 
simply crashed.

If cp is really the backup tool being employed, I would recommend using 
pg_basebackup. cp has flaws that could lead to corruption, and of course 
does not at all take into account the archive required to make a backup 
consistent, directories to be excluded, the order of copying pg_control 
on backup from standy, etc., etc.

Backup/restore is not a simple endeavor and we don't do anyone favors 
pretending that it is.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

"David G. Johnston"

Date:

27 October 2023, 17:45:28

On Fri, Oct 27, 2023 at 7:10 AM David Steele <david@pgmasters.net> wrote:

On 10/26/23 17:27, David G. Johnston wrote:

> Can we not figure out some way to place the relevant files onto the
> server somewhere so that a simple "cp" command would work? Have
> pg_backup_stop return paths instead of contents, those paths being
> "$TEMP_DIR"/<random unique new directory>/pg_control.conf (and
> tablespace_map)

Nobody has been able to figure this out, and some of us have been
thinking about it for years. It just doesn't seem possible to reliably
tell the difference between a cluster that was copied and one that
simply crashed.

If cp is really the backup tool being employed, I would recommend using
pg_basebackup. cp has flaws that could lead to corruption, and of course
does not at all take into account the archive required to make a backup
consistent, directories to be excluded, the order of copying pg_control
on backup from standy, etc., etc.

Let me modify that to make it a bit more clear, I actually wouldn't care if pg_backup_end outputs an entire binary pg_control file as part of the SQL resultset.

My proposal would be to, in addition, place in the temporary directory on the server, Postgres-written versions of pg_control and tablespace_map as part of the pg_backup_end processing. The client software would then have a choice. Write the contents of the SQL resultset to newly created binary mode files in the destination, or, copy the server-written files from the temporary directory to the destination.

That said, I'm starting to dislike that idea myself. It only really makes sense if the files could be placed in the data directory but that isn't doable given concurrent backups and not wanting to place the source server into an inconsistent state.

David J.

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

05 November 2023, 17:30:07

On 10/27/23 13:45, David G. Johnston wrote:
> 
> Let me modify that to make it a bit more clear, I actually wouldn't care 
> if pg_backup_end outputs an entire binary pg_control file as part of the 
> SQL resultset.
> 
> My proposal would be to, in addition, place in the temporary directory 
> on the server, Postgres-written versions of pg_control and 
> tablespace_map as part of the pg_backup_end processing.  The client 
> software would then have a choice.  Write the contents of the SQL 
> resultset to newly created binary mode files in the destination, or, 
> copy the server-written files from the temporary directory to the 
> destination.
> 
> That said, I'm starting to dislike that idea myself.  It only really 
> makes sense if the files could be placed in the data directory but that 
> isn't doable given concurrent backups and not wanting to place the 
> source server into an inconsistent state.

Pretty much the conclusion I have come to myself over the years.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

05 November 2023, 17:45:39

Rebased on 151ffcf6.

Attachment

v02-recovery-in-pgcontrol-remove-backuplabel.patch

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

06 November 2023, 05:05:59

On Fri, Oct 27, 2023 at 10:10:42AM -0400, David Steele wrote:
> We are still planning to address this issue in the back branches.

FWIW, redesigning the backend code in charge of doing base backups in
the back branches is out of scope.  Based on a read of the proposed
patch, it includes catalog changes which would require a catversion
bump, so that's not going to work anyway.
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

06 November 2023, 06:35:28

On Sun, Nov 05, 2023 at 01:45:39PM -0400, David Steele wrote:
> Rebased on 151ffcf6.

I like this patch a lot.  Even if the backup_label file is removed, we
still have all the debug information from the backup history file,
thanks to its LABEL, BACKUP METHOD and BACKUP FROM, so no information
is lost.  It does a 1:1 replacement of the contents parsed from the
backup_label needed by recovery by fetching them from the control
file.  Sounds like a straight-forward change to me.

The patch is failing the recovery test 039_end_of_wal.pl.  Could you
look at the failure?

         /* Build and save the contents of the backup history file */
-        history_file = build_backup_content(state, true);
+        history_file = build_backup_content(state);

build_backup_content() sounds like an incorrect name if it is a
routine onlyused to build the contents of backup history files.

Why is there nothing updated in src/bin/pg_controldata/?

+        /* Clear fields used to initialize recovery */
+        ControlFile->backupCheckPoint = InvalidXLogRecPtr;
+        ControlFile->backupStartPointTLI = 0;
+        ControlFile->backupRecoveryRequired = false;
+        ControlFile->backupFromStandby = false;

These variables in the control file are cleaned up when the
backup_label file was read previously, but backup_label is renamed to
backup_label.old a bit later than that.  Your logic looks correct seen
from here, but shouldn't these variables be set much later, aka just
*after* UpdateControlFile().  This gap between the initialization of
the control file and the in-memory reset makes the code quite brittle,
IMO.

-        basebackup_progress_wait_wal_archive(&state);
-        do_pg_backup_stop(backup_state, !opt->nowait);

Why is that moved?

-    The backup label
-    file includes the label string you gave to <function>pg_backup_start</function>,
-    as well as the time at which <function>pg_backup_start</function> was run, and
-    the name of the starting WAL file.  In case of confusion it is therefore
-    possible to look inside a backup file and determine exactly which
-    backup session the dump file came from.  The tablespace map file includes
+    The tablespace map file includes

It may be worth mentioning that the backup history file holds this
information on the primary's pg_wal, as well.

The changes in sendFileWithContent() may be worth a patch of its own.

--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -146,6 +146,9 @@ typedef struct ControlFileData
@@ -160,14 +163,25 @@ typedef struct ControlFileData
     XLogRecPtr    minRecoveryPoint;
     TimeLineID    minRecoveryPointTLI;
+    XLogRecPtr    backupCheckPoint;
     XLogRecPtr    backupStartPoint;
+    TimeLineID    backupStartPointTLI;
     XLogRecPtr    backupEndPoint;
+    bool         backupRecoveryRequired;
+    bool         backupFromStandby;

This increases the size of the control file from 296B to 312B with an
8-byte alignment, as far as I can see.  The size of the control file
has been always a sensitive subject especially with the hard limit of
PG_CONTROL_MAX_SAFE_SIZE.  Well, the point of this patch is that this
is the price to pay to prevent users from doing something stupid with
a removal of a backup_label when they should not.  Do others have an
opinion about this increase in size?

Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
reduce more the size with some alignment magic, no?

-    /*
-     * BACKUP METHOD lets us know if this was a typical backup ("streamed",
-     * which could mean either pg_basebackup or the pg_backup_start/stop
-     * method was used) or if this label came from somewhere else (the only
-     * other option today being from pg_rewind).  If this was a streamed
-     * backup then we know that we need to play through until we get to the
-     * end of the WAL which was generated during the backup (at which point we
-     * will have reached consistency and backupEndRequired will be reset to be
-     * false).
-     */
-    if (fscanf(lfp, "BACKUP METHOD: %19s\n", backuptype) == 1)
-    {
-        if (strcmp(backuptype, "streamed") == 0)
-            *backupEndRequired = true;
-    }

backupRecoveryRequired in the control file is switched to false for
pg_rewind and true for streamed backups.  My gut feeling is telling me
that this should be OK, as out-of-core tools would need an upgrade if
they relied on the backend_label file anyway.  I can see that this
change makes use lose some documentation, unfortunately.  Shouldn't
these removed lines be moved to pg_control.h instead for the
description of backupEndRequired?

doc/src/sgml/ref/pg_rewind.sgml and
src/backend/access/transam/xlogrecovery.c still include references to
the backup_label file.
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

06 November 2023, 14:48:56

On 11/6/23 01:05, Michael Paquier wrote:
> On Fri, Oct 27, 2023 at 10:10:42AM -0400, David Steele wrote:
>> We are still planning to address this issue in the back branches.
> 
> FWIW, redesigning the backend code in charge of doing base backups in
> the back branches is out of scope.  Based on a read of the proposed
> patch, it includes catalog changes which would require a catversion
> bump, so that's not going to work anyway.

I did not mean this patch -- rather some variation of what Thomas has 
been working on, more than likely.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

06 November 2023, 21:39:02

On 11/6/23 02:35, Michael Paquier wrote:
> On Sun, Nov 05, 2023 at 01:45:39PM -0400, David Steele wrote:
>> Rebased on 151ffcf6.
> 
> I like this patch a lot.  Even if the backup_label file is removed, we
> still have all the debug information from the backup history file,
> thanks to its LABEL, BACKUP METHOD and BACKUP FROM, so no information
> is lost.  It does a 1:1 replacement of the contents parsed from the
> backup_label needed by recovery by fetching them from the control
> file.  Sounds like a straight-forward change to me.

That's the plan, at least!

> The patch is failing the recovery test 039_end_of_wal.pl.  Could you
> look at the failure?

I'm not seeing this failure, and CI seems happy [1]. Can you give 
details of the error message?

>           /* Build and save the contents of the backup history file */
> -        history_file = build_backup_content(state, true);
> +        history_file = build_backup_content(state);
> 
> build_backup_content() sounds like an incorrect name if it is a
> routine onlyused to build the contents of backup history files.

Good point, I have renamed this to build_backup_history_content().

> Why is there nothing updated in src/bin/pg_controldata/?

Oops, added.

> +        /* Clear fields used to initialize recovery */
> +        ControlFile->backupCheckPoint = InvalidXLogRecPtr;
> +        ControlFile->backupStartPointTLI = 0;
> +        ControlFile->backupRecoveryRequired = false;
> +        ControlFile->backupFromStandby = false;
> 
> These variables in the control file are cleaned up when the
> backup_label file was read previously, but backup_label is renamed to
> backup_label.old a bit later than that.  Your logic looks correct seen
> from here, but shouldn't these variables be set much later, aka just
> *after* UpdateControlFile().  This gap between the initialization of
> the control file and the in-memory reset makes the code quite brittle,
> IMO.

If we set these fields where backup_label was renamed, the logic would 
not be exactly the same since pg_control won't be updated until the next 
time through the loop. Since the fields should be updated before 
UpdateControlFile() I thought it made sense to keep all the updates 
together.

Overall I think it is simpler, and we don't need to acquire a lock on 
ControlFile.

> -        basebackup_progress_wait_wal_archive(&state);
> -        do_pg_backup_stop(backup_state, !opt->nowait);
> 
> Why is that moved?

do_pg_backup_stop() generates the updated pg_control so it needs to run 
before we transmit pg_control.

> -    The backup label
> -    file includes the label string you gave to <function>pg_backup_start</function>,
> -    as well as the time at which <function>pg_backup_start</function> was run, and
> -    the name of the starting WAL file.  In case of confusion it is therefore
> -    possible to look inside a backup file and determine exactly which
> -    backup session the dump file came from.  The tablespace map file includes
> +    The tablespace map file includes
> 
> It may be worth mentioning that the backup history file holds this
> information on the primary's pg_wal, as well.

OK, reworded.

> The changes in sendFileWithContent() may be worth a patch of its own.

Thomas included this change in his pg_basebackup changes so I did the 
same. Maybe wait a bit before we split this out? Seems like a pretty 
small change...

> --- a/src/include/catalog/pg_control.h
> +++ b/src/include/catalog/pg_control.h
> @@ -146,6 +146,9 @@ typedef struct ControlFileData
> @@ -160,14 +163,25 @@ typedef struct ControlFileData
>       XLogRecPtr    minRecoveryPoint;
>       TimeLineID    minRecoveryPointTLI;
> +    XLogRecPtr    backupCheckPoint;
>       XLogRecPtr    backupStartPoint;
> +    TimeLineID    backupStartPointTLI;
>       XLogRecPtr    backupEndPoint;
> +    bool         backupRecoveryRequired;
> +    bool         backupFromStandby;
> 
> This increases the size of the control file from 296B to 312B with an
> 8-byte alignment, as far as I can see.  The size of the control file
> has been always a sensitive subject especially with the hard limit of
> PG_CONTROL_MAX_SAFE_SIZE.  Well, the point of this patch is that this
> is the price to pay to prevent users from doing something stupid with
> a removal of a backup_label when they should not.  Do others have an
> opinion about this increase in size?
> 
> Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
> reduce more the size with some alignment magic, no?

I thought about this, but it seemed to me that existing fields had been 
positioned to make the grouping logical rather than to optimize 
alignment, e.g. minRecoveryPointTLI. Ideally that would have been placed 
near backupEndRequired (or vice versa). But if the general opinion is to 
rearrange for alignment, I'm OK with that.

> backupRecoveryRequired in the control file is switched to false for
> pg_rewind and true for streamed backups.  My gut feeling is telling me
> that this should be OK, as out-of-core tools would need an upgrade if
> they relied on the backend_label file anyway.  I can see that this
> change makes use lose some documentation, unfortunately.  Shouldn't
> these removed lines be moved to pg_control.h instead for the
> description of backupEndRequired?

Updated description in pg_control.h -- it's a bit vague but not sure it 
is a good idea to get into the inner workings of pg_rewind here?

> doc/src/sgml/ref/pg_rewind.sgml and
> src/backend/access/transam/xlogrecovery.c still include references to
> the backup_label file.

Fixed.

Attached is a new patch based on 18b585155.

Regards,
-David

[1] https://cirrus-ci.com/build/4939808120766464

Attachment

v03-recovery-in-pgcontrol-remove-backuplabel.patch

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

07 November 2023, 08:20:27

On Mon, Nov 06, 2023 at 05:39:02PM -0400, David Steele wrote:
> On 11/6/23 02:35, Michael Paquier wrote:
>> The patch is failing the recovery test 039_end_of_wal.pl.  Could you
>> look at the failure?
>
> I'm not seeing this failure, and CI seems happy [1]. Can you give details of
> the error message?

I've retested today, and miss the failure.  I'll let you know if I see
this again.

>> +        /* Clear fields used to initialize recovery */
>> +        ControlFile->backupCheckPoint = InvalidXLogRecPtr;
>> +        ControlFile->backupStartPointTLI = 0;
>> +        ControlFile->backupRecoveryRequired = false;
>> +        ControlFile->backupFromStandby = false;
>>
>> These variables in the control file are cleaned up when the
>> backup_label file was read previously, but backup_label is renamed to
>> backup_label.old a bit later than that.  Your logic looks correct seen
>> from here, but shouldn't these variables be set much later, aka just
>> *after* UpdateControlFile().  This gap between the initialization of
>> the control file and the in-memory reset makes the code quite brittle,
>> IMO.

Yeah, sorry, there's a think from me here.  I meant to reset these
variables just before the UpdateControlFile() after InitWalRecovery()
in UpdateControlFile(), much closer to it.

> If we set these fields where backup_label was renamed, the logic would not
> be exactly the same since pg_control won't be updated until the next time
> through the loop. Since the fields should be updated before
> UpdateControlFile() I thought it made sense to keep all the updates
> together.
>
> Overall I think it is simpler, and we don't need to acquire a lock on
> ControlFile.

What you are proposing is the same as what we already do for
backupEndRequired or backupStartPoint in the control file when
initializing recovery, so objection withdrawn.

> Thomas included this change in his pg_basebackup changes so I did the same.
> Maybe wait a bit before we split this out? Seems like a pretty small
> change...

Seems like a pretty good argument for refactoring that now, and let
any other patches rely on it.  Would you like to send a separate
patch?

>> Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
>> reduce more the size with some alignment magic, no?
>
> I thought about this, but it seemed to me that existing fields had been
> positioned to make the grouping logical rather than to optimize alignment,
> e.g. minRecoveryPointTLI. Ideally that would have been placed near
> backupEndRequired (or vice versa). But if the general opinion is to
> rearrange for alignment, I'm OK with that.

I've not tested, but it looks like moving backupStartPointTLI after
backupEndPoint should shave 8 bytes, if you want to maintain a more
coherent group for the LSNs.
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

10 November 2023, 04:37:00

On Tue, Nov 07, 2023 at 05:20:27PM +0900, Michael Paquier wrote:
> On Mon, Nov 06, 2023 at 05:39:02PM -0400, David Steele wrote:
> I've retested today, and miss the failure.  I'll let you know if I see
> this again.

I've done a few more dozen runs, and still nothing.  I am wondering
what this disturbance was.

>> If we set these fields where backup_label was renamed, the logic would not
>> be exactly the same since pg_control won't be updated until the next time
>> through the loop. Since the fields should be updated before
>> UpdateControlFile() I thought it made sense to keep all the updates
>> together.
>>
>> Overall I think it is simpler, and we don't need to acquire a lock on
>> ControlFile.
>
> What you are proposing is the same as what we already do for
> backupEndRequired or backupStartPoint in the control file when
> initializing recovery, so objection withdrawn.
>
>> Thomas included this change in his pg_basebackup changes so I did the same.
>> Maybe wait a bit before we split this out? Seems like a pretty small
>> change...
>
> Seems like a pretty good argument for refactoring that now, and let
> any other patches rely on it.  Would you like to send a separate
> patch?

The split still looks worth doing seen from here, so I am switching
the patch as WoA for now.

>>> Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
>>> reduce more the size with some alignment magic, no?
>>
>> I thought about this, but it seemed to me that existing fields had been
>> positioned to make the grouping logical rather than to optimize alignment,
>> e.g. minRecoveryPointTLI. Ideally that would have been placed near
>> backupEndRequired (or vice versa). But if the general opinion is to
>> rearrange for alignment, I'm OK with that.
>
> I've not tested, but it looks like moving backupStartPointTLI after
> backupEndPoint should shave 8 bytes, if you want to maintain a more
> coherent group for the LSNs.

+    * backupFromStandby indicates that the backup was taken on a standby. It is
+    * require to initialize recovery and set to false afterwards.
s/require/required/.

The term "backup recovery", that we've never used in the tree until
now as far as I know.  Perhaps this recovery method should just be
referred as "recovery from backup"?

By the way, there is another thing that this patch has forgotten: the
SQL functions that display data from the control file.  Shouldn't
pg_control_recovery() be extended with the new fields?  These fields
may be less critical than the other ones related to recovery, but I
suspect that showing them can become handy at least for debugging and
monitoring purposes.

Something in this area is that backupRecoveryRequired is the switch
controlling if the fields set by the recovery initialization.  Could
it be actually useful to leave the other fields as they are and only
reset backupRecoveryRequired before the first control file update?
This would leave a trace of the backup history directly in the control
file.

What about pg_resetwal and RewriteControlFile()?  Shouldn't these
recovery fields be reset as well?

git diff --check is complaining a bit.
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

10 November 2023, 18:55:19

On 11/10/23 00:37, Michael Paquier wrote:
> On Tue, Nov 07, 2023 at 05:20:27PM +0900, Michael Paquier wrote:
>> On Mon, Nov 06, 2023 at 05:39:02PM -0400, David Steele wrote:
>> I've retested today, and miss the failure.  I'll let you know if I see
>> this again.
> 
> I've done a few more dozen runs, and still nothing.  I am wondering
> what this disturbance was.

OK, hopefully it was just a blip.

>>> If we set these fields where backup_label was renamed, the logic would not
>>> be exactly the same since pg_control won't be updated until the next time
>>> through the loop. Since the fields should be updated before
>>> UpdateControlFile() I thought it made sense to keep all the updates
>>> together.
>>>
>>> Overall I think it is simpler, and we don't need to acquire a lock on
>>> ControlFile.
>>
>> What you are proposing is the same as what we already do for
>> backupEndRequired or backupStartPoint in the control file when
>> initializing recovery, so objection withdrawn.
>>
>>> Thomas included this change in his pg_basebackup changes so I did the same.
>>> Maybe wait a bit before we split this out? Seems like a pretty small
>>> change...
>>
>> Seems like a pretty good argument for refactoring that now, and let
>> any other patches rely on it.  Would you like to send a separate
>> patch?
> 
> The split still looks worth doing seen from here, so I am switching
> the patch as WoA for now.

This has been split out.

>>>> Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
>>>> reduce more the size with some alignment magic, no?
>>>
>>> I thought about this, but it seemed to me that existing fields had been
>>> positioned to make the grouping logical rather than to optimize alignment,
>>> e.g. minRecoveryPointTLI. Ideally that would have been placed near
>>> backupEndRequired (or vice versa). But if the general opinion is to
>>> rearrange for alignment, I'm OK with that.
>>
>> I've not tested, but it looks like moving backupStartPointTLI after
>> backupEndPoint should shave 8 bytes, if you want to maintain a more
>> coherent group for the LSNs.

OK, I have moved backupStartPointTLI.

> +    * backupFromStandby indicates that the backup was taken on a standby. It is
> +    * require to initialize recovery and set to false afterwards.
> s/require/required/.

Fixed.

> The term "backup recovery", that we've never used in the tree until
> now as far as I know.  Perhaps this recovery method should just be
> referred as "recovery from backup"?

Well, "backup recovery" is less awkward, I think. For instance "backup 
recovery field" vs "recovery from backup field".

> By the way, there is another thing that this patch has forgotten: the
> SQL functions that display data from the control file.  Shouldn't
> pg_control_recovery() be extended with the new fields?  These fields
> may be less critical than the other ones related to recovery, but I
> suspect that showing them can become handy at least for debugging and
> monitoring purposes.

I guess that depends on whether we reset them or not (discussion below). 
Right now they would not be visible since by the time the user could log 
on they would be reset.

> Something in this area is that backupRecoveryRequired is the switch
> controlling if the fields set by the recovery initialization.  Could
> it be actually useful to leave the other fields as they are and only
> reset backupRecoveryRequired before the first control file update?
> This would leave a trace of the backup history directly in the control
> file.

Since the other recovery fields are cleared in ReachedEndOfBackup() this 
would be a change from what we do now.

None of these fields are ever visible (with the exception of 
minRecoveryPoint/TLI) since they are reset when the database becomes 
consistent and before logons are allowed. Viewing them with 
pg_controldata makes sense, but I'm not sure pg_control_recovery() does.

In fact, are backup_start_lsn, backup_end_lsn, and 
end_of_backup_record_required ever non-zero when logged onto Postgres? 
Maybe I'm missing something?

> What about pg_resetwal and RewriteControlFile()?  Shouldn't these
> recovery fields be reset as well?

Done.

> git diff --check is complaining a bit.

Fixed.

New patches attached based on eb81e8e790.

Regards,
-David

Attachment

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

13 November 2023, 04:36:56

On Fri, Nov 10, 2023 at 02:55:19PM -0400, David Steele wrote:
> On 11/10/23 00:37, Michael Paquier wrote:
>> I've done a few more dozen runs, and still nothing.  I am wondering
>> what this disturbance was.
>
> OK, hopefully it was just a blip.

Still nothing on this side.  So that seems really like a random blip
in the matrix.

> This has been split out.

Thanks, applied 0001.

>> The term "backup recovery", that we've never used in the tree until
>> now as far as I know.  Perhaps this recovery method should just be
>> referred as "recovery from backup"?
>
> Well, "backup recovery" is less awkward, I think. For instance "backup
> recovery field" vs "recovery from backup field".

Not sure.  I've never used this term when referring to recovery from a
backup.  Perhaps I'm just not used to it, still that sounds a bit
confusing here.

>> Something in this area is that backupRecoveryRequired is the switch
>> controlling if the fields set by the recovery initialization.  Could
>> it be actually useful to leave the other fields as they are and only
>> reset backupRecoveryRequired before the first control file update?
>> This would leave a trace of the backup history directly in the control
>> file.
>
> Since the other recovery fields are cleared in ReachedEndOfBackup() this
> would be a change from what we do now.
>
> None of these fields are ever visible (with the exception of
> minRecoveryPoint/TLI) since they are reset when the database becomes
> consistent and before logons are allowed. Viewing them with pg_controldata
> makes sense, but I'm not sure pg_control_recovery() does.
>
> In fact, are backup_start_lsn, backup_end_lsn, and
> end_of_backup_record_required ever non-zero when logged onto Postgres? Maybe
> I'm missing something?

Yeah, but custom backup/restore tools may want manipulate the contents
of the control file for their own work, so at least for the sake of
visibility it sounds important to me to show all the information at
hand, and that there is no need to.

-    The backup label
-    file includes the label string you gave to <function>pg_backup_start</function>,
+    The backup history file (which is archived like WAL) includes the label
+    string you gave to <function>pg_backup_start</function>,
     as well as the time at which <function>pg_backup_start</function> was run, and
     the name of the starting WAL file.  In case of confusion it is therefore
-    possible to look inside a backup file and determine exactly which
+    possible to look inside a backup history file and determine exactly which

As a side note, it is a bit disappointing that we lose the backup
label from the backup itself, even if the patch changes correctly the
documentation to reflect the new behavior.  It is in the backup
history file on the node from where the base backup has been taken or
in the archives, hopefully.  However there is nothing that remains in
the base backup itself, and backups can be self-contained (easy with
pg_basebackup --wal-method=stream).  I think that we should retain a
minimum amount of information as a replacement for the backup_label,
at least.  With this state, the current patch slightly reduces the
debuggability of deployments.  That could be annoying for some users.

> New patches attached based on eb81e8e790.

Diving into the code for references about the backup label file, I
have spotted this log in pg_rewind that is now incorrect:
    if (showprogress)
        pg_log_info("creating backup label and updating control file");

+    printf(_("Backup start location's timeline:     %u\n"),
+           ControlFile->backupStartPointTLI);
     printf(_("Backup end location:                  %X/%X\n"),
            LSN_FORMAT_ARGS(ControlFile->backupEndPoint));
Perhaps these two should be reversed to match with the header file.


+    /*
+     * After pg_backup_stop() returns this field will contain a copy of
+     * pg_control that should be stored with the backup. Fields have been
+     * updated for recovery and the CRC has been recalculated. The buffer
+     * is padded to PG_CONTROL_MAX_SAFE_SIZE so that pg_control is always
+     * a consistent size but smaller (and hopefully easier to handle) than
+     * PG_CONTROL_FILE_SIZE. Bytes after sizeof(ControlFileData) are zeroed.
+     */
+    uint8_t controlFile[PG_CONTROL_MAX_SAFE_SIZE];

I don't mind the addition of a control file with the max safe size,
because it will never be higher than that.  However:

+                /* End the backup before sending pg_control */
+                basebackup_progress_wait_wal_archive(&state);
+                do_pg_backup_stop(backup_state, !opt->nowait);
+
+                /* Send copy of pg_control containing recovery info */
+                sendFileWithContent(sink, XLOG_CONTROL_FILE,
+                                    (char *)backup_state->controlFile,
+                                    PG_CONTROL_MAX_SAFE_SIZE, &manifest);

It seems to me that the base backup protocol should always send an 8k
file for the control file so as we maintain consistency with the
on-disk format.  Currently, a base backup taken with this patch
results in a control file of size 512B.

+    /* Build the contents of pg_control */
+    pg_control_bytea = (bytea *) palloc(PG_CONTROL_MAX_SAFE_SIZE + VARHDRSZ);
+    SET_VARSIZE(pg_control_bytea, PG_CONTROL_MAX_SAFE_SIZE + VARHDRSZ);
+    memcpy(VARDATA(pg_control_bytea), backup_state->controlFile, PG_CONTROL_MAX_SAFE_SIZE);

Similar comment for the control file returned by pg_backup_stop(),
which could just be made a 8k field?

+     <function>pg_backup_stop</function> returns the
+     <filename>pg_control</filename> file, which must be stored in the
+     <filename>global</filename> directory of the backup. It also returns the

And perhaps emphasize that this file should be an 8kB file in the
paragraph mentioning the data returned by pg_backup_stop()?

-      Create a <filename>backup_label</filename> file to begin WAL replay at
+      Update <filename>pg_control</filename> file to begin WAL replay at
       the checkpoint created at failover and configure the
       <filename>pg_control</filename> file with a minimum consistency LSN

pg_control is mentioned twice, so perhaps this could be worded better?

PG_CONTROL_VERSION is important to not forget about..  Perhaps this
should be noted somewhere, or just changed in the patch itself.
Contrary to catalog changes, we do few of these in the control file so
there is close to zero risk of conflicts with other patches in the CF
app.
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

20 November 2023, 01:15:49

(I am not exactly sure how, but we've lost pgsql-hackers on the way
when you sent v5.  Now added back in CC with the two latest patches
you've proposed attached.)

Here is a short summary of what has been missed by the lists:
- I've commented that the patch should not create, not show up in
fields returned the SQL functions or stream control files with a size
of 512B, just stick to 8kB.  If this is worth changing this should be
applied consistently across the board including initdb, discussed on
its own thread.
- The backup-related fields in the control file are reset at the end
of recovery.  I've suggested to not do that to keep a trace of what
was happening during recovery.  The latest version of the patch resets
the fields.
- With the backup_label file gone, we lose some information in the
backups themselves, which is not good.  Instead, you have suggested an
approach where this data is added to the backup manifest, meaning that
no information would be lost, particularly useful for self-contained
backups.  The fields planned to be added to the backup manifest are:
-- The start and end time of the backup, the end timestamp being
useful to know when stop time can be used for PITR.
-- The backup label.
I've agreed that it may be the best thing to do at this end to not
lose any data related to the removal of the backup_label file.

On Sun, Nov 19, 2023 at 02:14:32PM -0400, David Steele wrote:
> On 11/15/23 20:03, Michael Paquier wrote:
>> As the label is only an informational field, the parsing added to
>> pg_verifybackup is not really needed because it is used nowhere in the
>> validation process, so keeping the logic simpler would be the way to
>> go IMO.  This is contrary to the WAL range for example, where start
>> and end LSNs are used for validation with a pg_waldump command.
>> Robert, any comments about the addition of the label in the manifest?
>
> I'm sure Robert will comment on this when he gets the time, but for now I
> have backed off on passing the new info to pg_verifybackup and added
> start/stop time.

FWIW, I'm OK with the bits for the backup manifest as presented.  So
if there are no remarks and/or no objections, I'd like to apply it but
let give some room to others to comment on that as there's been a gap
in the emails exchanged on pgsql-hackers.  I hope that the summary
I've posted above covers everything.  So let's see about doing
something around the middle of next week.  With Thanksgiving in the
US, a lot of folks will not have the time to monitor what's happening
on this thread.

+      The end time for the backup. This is when the backup was stopped in
+      <productname>PostgreSQL</productname> and represents the earliest time
+      that can be used for time-based Point-In-Time Recovery.

This one is actually a very good point.  We'd lost this capacity with
the backup_label file gone without the end timestamps in the control
file.

> New patches attached based on b218fbb7.

I've noticed on the other thread the remark about being less
aggressive with the fields related to recovery in the control file, so
I assume that this patch should leave the fields be after the end of
recovery from the start and only rely on backupRecoveryRequired to
decide if the recovery should use the fields or not:
https://www.postgresql.org/message-id/241ccde1-1928-4ba2-a0bb-5350f7b191a8@=pgmasters.net

+    ControlFile->backupCheckPoint = InvalidXLogRecPtr;
     ControlFile->backupStartPoint = InvalidXLogRecPtr;
+    ControlFile->backupStartPointTLI = 0;
     ControlFile->backupEndPoint = InvalidXLogRecPtr;
+    ControlFile->backupFromStandby = false;
     ControlFile->backupEndRequired = false;

Still, I get the temptation of being consistent with the current style
on HEAD to reset everything, as well..
--
Michael

Attachment

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

20 November 2023, 13:50:56

On 11/19/23 21:15, Michael Paquier wrote:
> (I am not exactly sure how, but we've lost pgsql-hackers on the way
> when you sent v5.  Now added back in CC with the two latest patches
> you've proposed attached.)

Ugh, I must have hit reply instead of reply all. It's a rookie error and 
you hate to see it.

> Here is a short summary of what has been missed by the lists:
> - I've commented that the patch should not create, not show up in
> fields returned the SQL functions or stream control files with a size
> of 512B, just stick to 8kB.  If this is worth changing this should be
> applied consistently across the board including initdb, discussed on
> its own thread.
> - The backup-related fields in the control file are reset at the end
> of recovery.  I've suggested to not do that to keep a trace of what
> was happening during recovery.  The latest version of the patch resets
> the fields.
> - With the backup_label file gone, we lose some information in the
> backups themselves, which is not good.  Instead, you have suggested an
> approach where this data is added to the backup manifest, meaning that
> no information would be lost, particularly useful for self-contained
> backups.  The fields planned to be added to the backup manifest are:
> -- The start and end time of the backup, the end timestamp being
> useful to know when stop time can be used for PITR.
> -- The backup label.
> I've agreed that it may be the best thing to do at this end to not
> lose any data related to the removal of the backup_label file.

This looks right to me.

> On Sun, Nov 19, 2023 at 02:14:32PM -0400, David Steele wrote:
>> On 11/15/23 20:03, Michael Paquier wrote:
>>> As the label is only an informational field, the parsing added to
>>> pg_verifybackup is not really needed because it is used nowhere in the
>>> validation process, so keeping the logic simpler would be the way to
>>> go IMO.  This is contrary to the WAL range for example, where start
>>> and end LSNs are used for validation with a pg_waldump command.
>>> Robert, any comments about the addition of the label in the manifest?
>>
>> I'm sure Robert will comment on this when he gets the time, but for now I
>> have backed off on passing the new info to pg_verifybackup and added
>> start/stop time.
> 
> FWIW, I'm OK with the bits for the backup manifest as presented.  So
> if there are no remarks and/or no objections, I'd like to apply it but
> let give some room to others to comment on that as there's been a gap
> in the emails exchanged on pgsql-hackers.  I hope that the summary
> I've posted above covers everything.  So let's see about doing
> something around the middle of next week.  With Thanksgiving in the
> US, a lot of folks will not have the time to monitor what's happening
> on this thread.

Timing sounds good to me.

> 
> +      The end time for the backup. This is when the backup was stopped in
> +      <productname>PostgreSQL</productname> and represents the earliest time
> +      that can be used for time-based Point-In-Time Recovery.
> 
> This one is actually a very good point.  We'd lost this capacity with
> the backup_label file gone without the end timestamps in the control
> file.

Yeah, the end time is very important for recovery scenarios. We 
definitely need that recorded somewhere.

> I've noticed on the other thread the remark about being less
> aggressive with the fields related to recovery in the control file, so
> I assume that this patch should leave the fields be after the end of
> recovery from the start and only rely on backupRecoveryRequired to
> decide if the recovery should use the fields or not:
> https://www.postgresql.org/message-id/241ccde1-1928-4ba2-a0bb-5350f7b191a8@=pgmasters.net
> 
> +    ControlFile->backupCheckPoint = InvalidXLogRecPtr;
>       ControlFile->backupStartPoint = InvalidXLogRecPtr;
> +    ControlFile->backupStartPointTLI = 0;
>       ControlFile->backupEndPoint = InvalidXLogRecPtr;
> +    ControlFile->backupFromStandby = false;
>       ControlFile->backupEndRequired = false;
> 
> Still, I get the temptation of being consistent with the current style
> on HEAD to reset everything, as well..

I'd rather reset everything for now (as we do now) and think about 
keeping these values as a separate patch. It may be that we don't want 
to keep all of them, or we need a separate flag to say recovery was 
completed. We are accumulating a lot of booleans here, maybe we need a 
state var (recoveryRequired, recoveryInProgress, recoveryComplete) and 
then define which other vars are valid in each state.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Robert Haas

Date:

20 November 2023, 16:11:13

On Sun, Nov 19, 2023 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:
> (I am not exactly sure how, but we've lost pgsql-hackers on the way
> when you sent v5.  Now added back in CC with the two latest patches
> you've proposed attached.)
>
> Here is a short summary of what has been missed by the lists:
> - I've commented that the patch should not create, not show up in
> fields returned the SQL functions or stream control files with a size
> of 512B, just stick to 8kB.  If this is worth changing this should be
> applied consistently across the board including initdb, discussed on
> its own thread.
> - The backup-related fields in the control file are reset at the end
> of recovery.  I've suggested to not do that to keep a trace of what
> was happening during recovery.  The latest version of the patch resets
> the fields.
> - With the backup_label file gone, we lose some information in the
> backups themselves, which is not good.  Instead, you have suggested an
> approach where this data is added to the backup manifest, meaning that
> no information would be lost, particularly useful for self-contained
> backups.  The fields planned to be added to the backup manifest are:
> -- The start and end time of the backup, the end timestamp being
> useful to know when stop time can be used for PITR.
> -- The backup label.
> I've agreed that it may be the best thing to do at this end to not
> lose any data related to the removal of the backup_label file.

I think we need more votes to make a change this big. I have a
concern, which I think I've expressed before, that we keep whacking
around the backup APIs, and that has a cost which is potentially
larger than the benefits. The last time we changed the API, we changed
pg_stop_backup to pg_backup_stop, but this doesn't do that, and I
wonder if that's OK. Even if it is, do we really want to change this
API around again after such a short time?

That said, I don't have an intrinsic problem with moving this
information from the backup_label to the backup_manifest file since it
is purely informational. I do think there should perhaps be some
additions to the test cases, though.

I am concerned about the interaction of this proposal with incremental
backup. When you take an incremental backup, you get something that
looks a lot like a usable data directory but isn't. To prevent that
from causing avoidable disasters, the present version of the patch
adds INCREMENTAL FROM LSN and INCREMENTAL FROM TLI fields to the
backup_label. pg_combinebackup knows to look for those fields, and the
server knows that if they are present it should refuse to start. With
this change, though, I suppose those fields would end up in
pg_control. But that does not feel entirely great, because we have a
goal of keeping the amount of real data in pg_control below 512 bytes,
the traditional sector size, and this adds another 12 bytes of stuff
to that file that currently doesn't need to be there. I feel like
that's kind of a problem.

But my main point here is ... if we have a few more senior hackers
weigh in and vote in favor of this change, well then that's one thing.
But IMHO a discussion that's mostly between 2 people is not nearly a
strong enough consensus to justify this amount of disruption.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

20 November 2023, 17:53:55

On 11/20/23 12:11, Robert Haas wrote:
> On Sun, Nov 19, 2023 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:
>> (I am not exactly sure how, but we've lost pgsql-hackers on the way
>> when you sent v5.  Now added back in CC with the two latest patches
>> you've proposed attached.)
>>
>> Here is a short summary of what has been missed by the lists:
>> - I've commented that the patch should not create, not show up in
>> fields returned the SQL functions or stream control files with a size
>> of 512B, just stick to 8kB.  If this is worth changing this should be
>> applied consistently across the board including initdb, discussed on
>> its own thread.
>> - The backup-related fields in the control file are reset at the end
>> of recovery.  I've suggested to not do that to keep a trace of what
>> was happening during recovery.  The latest version of the patch resets
>> the fields.
>> - With the backup_label file gone, we lose some information in the
>> backups themselves, which is not good.  Instead, you have suggested an
>> approach where this data is added to the backup manifest, meaning that
>> no information would be lost, particularly useful for self-contained
>> backups.  The fields planned to be added to the backup manifest are:
>> -- The start and end time of the backup, the end timestamp being
>> useful to know when stop time can be used for PITR.
>> -- The backup label.
>> I've agreed that it may be the best thing to do at this end to not
>> lose any data related to the removal of the backup_label file.
> 
> I think we need more votes to make a change this big. I have a
> concern, which I think I've expressed before, that we keep whacking
> around the backup APIs, and that has a cost which is potentially
> larger than the benefits. 

 From my perspective it's not that big a change for backup software but 
it does bring a lot of benefits, including fixing an outstanding bug in 
Postgres, i.e. reading pg_control without getting a torn copy.

> The last time we changed the API, we changed
> pg_stop_backup to pg_backup_stop, but this doesn't do that, and I
> wonder if that's OK. Even if it is, do we really want to change this
> API around again after such a short time?

This is a good point. We could just rename again, but not sure what 
names to go for this time. OTOH if the backup software is selecting 
fields then they will get an error because the names have changed. If 
the software is grabbing fields by position then they'll get a 
valid-looking result (even if querying by position is a terrible idea).

Another thing we could do is explicitly error if we see backup_label in 
PGDATA during recovery. That's just a few lines of code so would not be 
a big deal to maintain. This error would only be visible on restore, so 
it presumes that backup software is being tested.

Maybe just a rename to something like pg_backup_begin/end would be the 
way to go.

> That said, I don't have an intrinsic problem with moving this
> information from the backup_label to the backup_manifest file since it
> is purely informational. I do think there should perhaps be some
> additions to the test cases, though.

A little hard to add to the tests, I think, since they are purely 
informational, i.e. not pushed up by the parser. Maybe we could just 
grep for the fields?

> I am concerned about the interaction of this proposal with incremental
> backup. When you take an incremental backup, you get something that
> looks a lot like a usable data directory but isn't. To prevent that
> from causing avoidable disasters, the present version of the patch
> adds INCREMENTAL FROM LSN and INCREMENTAL FROM TLI fields to the
> backup_label. pg_combinebackup knows to look for those fields, and the
> server knows that if they are present it should refuse to start. With
> this change, though, I suppose those fields would end up in
> pg_control. But that does not feel entirely great, because we have a
> goal of keeping the amount of real data in pg_control below 512 bytes,
> the traditional sector size, and this adds another 12 bytes of stuff
> to that file that currently doesn't need to be there. I feel like
> that's kind of a problem.

I think these fields would be handled the same as the rest of the fields 
in backup_label: returned from pg_backup_stop() and also stored in 
backup_manifest. Third-party software can do as they like with them and 
pg_combinebackup can just read from backup_manifest.

As for the pg_control file -- it might be best to give it a different 
name for backups that are not essentially copies of PGDATA. On the other 
hand, pgBackRest has included pg_control in incremental backups since 
day one and we've never had a user mistakenly do a manual restore of one 
and cause a problem (though manual restores are not the norm). Still, 
probably can't hurt to be a bit careful.

> But my main point here is ... if we have a few more senior hackers
> weigh in and vote in favor of this change, well then that's one thing.
> But IMHO a discussion that's mostly between 2 people is not nearly a
> strong enough consensus to justify this amount of disruption.

We absolutely need more people to look at this and sign off. I'm glad 
they have not so far because it has allowed time to whack the patch 
around and get it into better shape.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Robert Haas

Date:

20 November 2023, 18:44:26

On Mon, Nov 20, 2023 at 12:54 PM David Steele <david@pgmasters.net> wrote:
> Another thing we could do is explicitly error if we see backup_label in
> PGDATA during recovery. That's just a few lines of code so would not be
> a big deal to maintain. This error would only be visible on restore, so
> it presumes that backup software is being tested.

I think that if we do decide to adopt this proposal, that would be a
smart precaution.

> A little hard to add to the tests, I think, since they are purely
> informational, i.e. not pushed up by the parser. Maybe we could just
> grep for the fields?

Hmm. Or should they be pushed up by the parser?

> I think these fields would be handled the same as the rest of the fields
> in backup_label: returned from pg_backup_stop() and also stored in
> backup_manifest. Third-party software can do as they like with them and
> pg_combinebackup can just read from backup_manifest.

I think that would be a bad plan, because this is critical
information, and a backup manifest is not a thing that you're required
to have. It's not a natural fit at all. We don't want to create a
situation where if you nuke the backup_manifest then the server
forgets that what it has is an incremental backup rather than a usable
data directory.

> We absolutely need more people to look at this and sign off. I'm glad
> they have not so far because it has allowed time to whack the patch
> around and get it into better shape.

Cool.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

20 November 2023, 19:41:38

On 11/20/23 14:44, Robert Haas wrote:
> On Mon, Nov 20, 2023 at 12:54 PM David Steele <david@pgmasters.net> wrote:
>> Another thing we could do is explicitly error if we see backup_label in
>> PGDATA during recovery. That's just a few lines of code so would not be
>> a big deal to maintain. This error would only be visible on restore, so
>> it presumes that backup software is being tested.
> 
> I think that if we do decide to adopt this proposal, that would be a
> smart precaution.

I'd be OK with it -- what do you think, Michael? Would this be enough 
that we would not need to rename the functions, or should we just go 
with the rename?

>> A little hard to add to the tests, I think, since they are purely
>> informational, i.e. not pushed up by the parser. Maybe we could just
>> grep for the fields?
> 
> Hmm. Or should they be pushed up by the parser?

We could do that. I started on that road, but it's a lot of code for 
fields that aren't used. I think it would be better if the parser also 
loaded a data structure that represented the manifest. Seems to me 
there's a lot of duplicated code between pg_verifybackup and 
pg_combinebackup the way it is now.

>> I think these fields would be handled the same as the rest of the fields
>> in backup_label: returned from pg_backup_stop() and also stored in
>> backup_manifest. Third-party software can do as they like with them and
>> pg_combinebackup can just read from backup_manifest.
> 
> I think that would be a bad plan, because this is critical
> information, and a backup manifest is not a thing that you're required
> to have. It's not a natural fit at all. We don't want to create a
> situation where if you nuke the backup_manifest then the server
> forgets that what it has is an incremental backup rather than a usable
> data directory.

I can't see why a backup would continue to be valid without a manifest 
-- that's not very standard for backup software. If you have the 
critical info in backup_label, you can't afford to lose that, so why 
should backup_manifest be any different?

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Robert Haas

Date:

20 November 2023, 19:47:52

On Mon, Nov 20, 2023 at 2:41 PM David Steele <david@pgmasters.net> wrote:
> I can't see why a backup would continue to be valid without a manifest
> -- that's not very standard for backup software. If you have the
> critical info in backup_label, you can't afford to lose that, so why
> should backup_manifest be any different?

I mean, you can run pg_basebackup --no-manifest.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

20 November 2023, 19:56:19

On 11/20/23 15:47, Robert Haas wrote:
> On Mon, Nov 20, 2023 at 2:41 PM David Steele <david@pgmasters.net> wrote:
>> I can't see why a backup would continue to be valid without a manifest
>> -- that's not very standard for backup software. If you have the
>> critical info in backup_label, you can't afford to lose that, so why
>> should backup_manifest be any different?
> 
> I mean, you can run pg_basebackup --no-manifest.

Maybe this would be a good thing to disable for page incremental. With 
all the work being done by pg_combinebackup, it seems like it would be a 
good idea to be able to verify the final result?

I understand this is an option -- but does it need to be? What is the 
benefit of excluding the manifest?

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Andres Freund

Date:

20 November 2023, 20:37:46

Hi,

On 2023-11-20 11:11:13 -0500, Robert Haas wrote:
> I think we need more votes to make a change this big. I have a
> concern, which I think I've expressed before, that we keep whacking
> around the backup APIs, and that has a cost which is potentially
> larger than the benefits.

+1.  The amount of whacking around in this area has been substantial, and it's
hard for operators to keep up. And realistically, with data sizes today, the
pressure to do basebackups with disk snapshots etc is not going to shrink.

Leaving that concern aside, I am still on the fence about this proposal. I
think it does decrease the chance of getting things wrong in the
streaming-basebackup case. But for external backups, it seems almost
universally worse (with the exception of the torn pg_control issue, that we
also can address otherwise):

It doesn't reduce the risk of getting things wrong, you can still omit placing
a file into the data directory and get silent corruption as a consequence. In
addition, it's harder to see when looking at a base backup whether the process
was right or not, because now the good and bad state look the same if you just
look on the filesystem level!

Then there's the issue of making ad-hoc debugging harder by not having a
human readable file with information anymore, including when looking at the
history, via backup_label.old.

Given that, I wonder if what we should do is to just add a new field to
pg_control that says "error out if backup_label does not exist", that we set
when creating a streaming base backup

Greetings,

Andres Freund

Re: Add recovery to pg_control and remove backup_label

From

Andres Freund

Date:

20 November 2023, 20:41:10

Hi,

On 2023-11-20 15:56:19 -0400, David Steele wrote:
> I understand this is an option -- but does it need to be? What is the
> benefit of excluding the manifest?

It's not free to create the manifest, particularly if checksums are enabled.

Also, for external backups, there's no manifest...

- Andres

Re: Add recovery to pg_control and remove backup_label

From

"David G. Johnston"

Date:

20 November 2023, 21:18:15

On Mon, Nov 20, 2023 at 1:37 PM Andres Freund <andres@anarazel.de> wrote:

Given that, I wonder if what we should do is to just add a new field to
pg_control that says "error out if backup_label does not exist", that we set
when creating a streaming base backup

I thought this was DOA since we don't want to ever leave the cluster in a state where a crash requires intervention to restart. But I agree that it is not possible to fool-proof agaInst a naive backup that copies over the pg_control file as-is if breaking the crashed cluster option is not in play.

I agree that this works if the pg_control generated by stop backup produces the line and we retain the label file as a separate and now mandatory component to using the backup.

Or is the idea to make v17 error if it sees a backup label unless pg_control has the feature flag field? Which doesn't exist normally, does in the basebackup version, and is removed once the backup is restored?

David J.

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

20 November 2023, 23:37:01

On Mon, Nov 20, 2023 at 11:11:13AM -0500, Robert Haas wrote:
> I think we need more votes to make a change this big. I have a
> concern, which I think I've expressed before, that we keep whacking
> around the backup APIs, and that has a cost which is potentially
> larger than the benefits. The last time we changed the API, we changed
> pg_stop_backup to pg_backup_stop, but this doesn't do that, and I
> wonder if that's OK. Even if it is, do we really want to change this
> API around again after such a short time?

Agreed.

> That said, I don't have an intrinsic problem with moving this
> information from the backup_label to the backup_manifest file since it
> is purely informational. I do think there should perhaps be some
> additions to the test cases, though.

Yep, cool.  Even if we decide to not go with what's discussed in this
patch, I think that's useful for some users at the end to get more
redundancy, as well.  And that's in a format easier to parse.

> I am concerned about the interaction of this proposal with incremental
> backup. When you take an incremental backup, you get something that
> looks a lot like a usable data directory but isn't. To prevent that
> from causing avoidable disasters, the present version of the patch
> adds INCREMENTAL FROM LSN and INCREMENTAL FROM TLI fields to the
> backup_label. pg_combinebackup knows to look for those fields, and the
> server knows that if they are present it should refuse to start. With
> this change, though, I suppose those fields would end up in
> pg_control. But that does not feel entirely great, because we have a
> goal of keeping the amount of real data in pg_control below 512 bytes,
> the traditional sector size, and this adds another 12 bytes of stuff
> to that file that currently doesn't need to be there. I feel like
> that's kind of a problem.

I don't recall one time where the addition of new fields to the
control file was easy to discuss because of its 512B hard limit.
Anyway, putting the addition aside for a second, and I've not looked
at the incremental backup patch, does the removal of the backup_label
make the combine logic more complicated, or that's just moving a chunk
of code to do a control file lookup instead of a backup_file parsing?
Making the information less readable is definitely an issue for me.  A
different alternative that I've mentioned upthread is to keep an
equivalent of the backup_label and rename it to something like
backup.debug or similar, with a name good enough to tell people that
we don't care about it being removed.
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

Andres Freund

Date:

20 November 2023, 23:41:40

Hi,

On 2023-11-20 14:18:15 -0700, David G. Johnston wrote:
> On Mon, Nov 20, 2023 at 1:37 PM Andres Freund <andres@anarazel.de> wrote:
> 
> >
> > Given that, I wonder if what we should do is to just add a new field to
> > pg_control that says "error out if backup_label does not exist", that we
> > set
> > when creating a streaming base backup
> >
> >
> I thought this was DOA since we don't want to ever leave the cluster in a
> state where a crash requires intervention to restart.

I was trying to suggest that we'd set the field in-memory, when streaming out
a pg_basebackup style backup (by just replacing pg_control with an otherwise
identical file that has the flag set). So it'd not have any effect on the
primary.

Greetings,

Andres Freund

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

20 November 2023, 23:52:08

On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote:
> Given that, I wonder if what we should do is to just add a new field to
> pg_control that says "error out if backup_label does not exist", that we set
> when creating a streaming base backup

That would mean that one still needs to take an extra step to update a
control file with this byte set, which is something you had a concern
with in terms of compatibility when it comes to external backup
solutions because more steps are necessary to take a backup, no?  I
don't quite see why it is different than what's proposed on this
thread, except that you don't need to write one file to the data
folder to store the backup label fields, but two, meaning that there's
a risk for more mistakes because a clean backup process would require
more steps.

With the current position of the fields in ControlFileData, there are
three free bytes after backupEndRequired, so it is possible to add
that for free.  Now, would you actually need an extra field knowing
that backupStartPoint is around?
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

Andres Freund

Date:

20 November 2023, 23:58:55

Hi,

On 2023-11-21 08:52:08 +0900, Michael Paquier wrote:
> On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote:
> > Given that, I wonder if what we should do is to just add a new field to
> > pg_control that says "error out if backup_label does not exist", that we set
> > when creating a streaming base backup
>
> That would mean that one still needs to take an extra step to update a
> control file with this byte set, which is something you had a concern
> with in terms of compatibility when it comes to external backup
> solutions because more steps are necessary to take a backup, no?

I was thinking we'd just set it in the pg_basebackup style path, and we'd
error out if it's set and backup_label is present. But we'd still use
backup_label without the pg_control flag set.

So it'd just provide a cross-check that backup_label was not removed for
pg_basebackup style backup, but wouldn't do anything for external backups. But
imo the proposal to just us pg_control doesn't actually do anything for
external backups either - which is why I think my proposal would achieve as
much, for a much lower price.

Greetings,

Andres Freund

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

21 November 2023, 03:45:55

On Mon, Nov 20, 2023 at 03:58:55PM -0800, Andres Freund wrote:
> I was thinking we'd just set it in the pg_basebackup style path, and we'd
> error out if it's set and backup_label is present. But we'd still use
> backup_label without the pg_control flag set.
>
> So it'd just provide a cross-check that backup_label was not removed for
> pg_basebackup style backup, but wouldn't do anything for external backups. But
> imo the proposal to just us pg_control doesn't actually do anything for
> external backups either - which is why I think my proposal would achieve as
> much, for a much lower price.

I don't see why not.  It does not increase the number of steps when
doing a backup, and backupStartPoint alone would not be able to offer
this much protection.
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

21 November 2023, 11:42:42

On 11/20/23 19:58, Andres Freund wrote:
> 
> On 2023-11-21 08:52:08 +0900, Michael Paquier wrote:
>> On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote:
>>> Given that, I wonder if what we should do is to just add a new field to
>>> pg_control that says "error out if backup_label does not exist", that we set
>>> when creating a streaming base backup
>>
>> That would mean that one still needs to take an extra step to update a
>> control file with this byte set, which is something you had a concern
>> with in terms of compatibility when it comes to external backup
>> solutions because more steps are necessary to take a backup, no?
> 
> I was thinking we'd just set it in the pg_basebackup style path, and we'd
> error out if it's set and backup_label is present. But we'd still use
> backup_label without the pg_control flag set.
> 
> So it'd just provide a cross-check that backup_label was not removed for
> pg_basebackup style backup, but wouldn't do anything for external backups. But
> imo the proposal to just us pg_control doesn't actually do anything for
> external backups either - which is why I think my proposal would achieve as
> much, for a much lower price.

I'm not sure why you think the patch under discussion doesn't do 
anything for external backups. It provides the same benefits to both 
pg_basebackup and external backups, i.e. they both receive the updated 
version of pg_control.

I really dislike the idea of pg_basebackup having a special mechanism 
for making recovery safer that is not generally available to external 
backup software. It might be easy enough for some (e.g. pgBackRest) to 
manipulate pg_control but would be out of reach for most.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Andres Freund

Date:

21 November 2023, 16:41:09

Hi,

On 2023-11-21 07:42:42 -0400, David Steele wrote:
> On 11/20/23 19:58, Andres Freund wrote:
> > On 2023-11-21 08:52:08 +0900, Michael Paquier wrote:
> > > On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote:
> > > > Given that, I wonder if what we should do is to just add a new field to
> > > > pg_control that says "error out if backup_label does not exist", that we set
> > > > when creating a streaming base backup
> > > 
> > > That would mean that one still needs to take an extra step to update a
> > > control file with this byte set, which is something you had a concern
> > > with in terms of compatibility when it comes to external backup
> > > solutions because more steps are necessary to take a backup, no?
> > 
> > I was thinking we'd just set it in the pg_basebackup style path, and we'd
> > error out if it's set and backup_label is present. But we'd still use
> > backup_label without the pg_control flag set.
> > 
> > So it'd just provide a cross-check that backup_label was not removed for
> > pg_basebackup style backup, but wouldn't do anything for external backups. But
> > imo the proposal to just us pg_control doesn't actually do anything for
> > external backups either - which is why I think my proposal would achieve as
> > much, for a much lower price.
> 
> I'm not sure why you think the patch under discussion doesn't do anything
> for external backups. It provides the same benefits to both pg_basebackup
> and external backups, i.e. they both receive the updated version of
> pg_control.

Sure. They also receive a backup_label today. If an external solution forgets
to replace pg_control copied as part of the filesystem copy, they won't get an
error after the remove of backup_label, just like they don't get one today if
they don't put backup_label in the data directory.  Given that users don't do
the right thing with backup_label today, why can we rely on them doing the
right thing with pg_control?

Greetings,

Andres Freund

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

21 November 2023, 17:17:48

On 11/21/23 12:41, Andres Freund wrote:
> 
> On 2023-11-21 07:42:42 -0400, David Steele wrote:
>> On 11/20/23 19:58, Andres Freund wrote:
>>> On 2023-11-21 08:52:08 +0900, Michael Paquier wrote:
>>>> On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote:
>>>>> Given that, I wonder if what we should do is to just add a new field to
>>>>> pg_control that says "error out if backup_label does not exist", that we set
>>>>> when creating a streaming base backup
>>>>
>>>> That would mean that one still needs to take an extra step to update a
>>>> control file with this byte set, which is something you had a concern
>>>> with in terms of compatibility when it comes to external backup
>>>> solutions because more steps are necessary to take a backup, no?
>>>
>>> I was thinking we'd just set it in the pg_basebackup style path, and we'd
>>> error out if it's set and backup_label is present. But we'd still use
>>> backup_label without the pg_control flag set.
>>>
>>> So it'd just provide a cross-check that backup_label was not removed for
>>> pg_basebackup style backup, but wouldn't do anything for external backups. But
>>> imo the proposal to just us pg_control doesn't actually do anything for
>>> external backups either - which is why I think my proposal would achieve as
>>> much, for a much lower price.
>>
>> I'm not sure why you think the patch under discussion doesn't do anything
>> for external backups. It provides the same benefits to both pg_basebackup
>> and external backups, i.e. they both receive the updated version of
>> pg_control.
> 
> Sure. They also receive a backup_label today. If an external solution forgets
> to replace pg_control copied as part of the filesystem copy, they won't get an
> error after the remove of backup_label, just like they don't get one today if
> they don't put backup_label in the data directory.  Given that users don't do
> the right thing with backup_label today, why can we rely on them doing the
> right thing with pg_control?

I think reliable backup software does the right thing with backup_label, 
but if the user starts getting errors on recovery they the decide to 
remove backup_label. I know we can't do much about bad backup software, 
but we can at least make this a bit more resistant to user error after 
the fact.

It doesn't help that one of our hints suggests removing backup_label. In 
highly automated systems, the user might not even know they just 
restored from a backup. They are only in the loop because the restore 
failed and they are trying to figure out what is going wrong. When they 
remove backup_label the cluster comes up just fine. Victory!

This is the scenario I've seen most often -- not the backup/restore 
process getting it wrong but the user removing backup_label on their own 
initiative. And because it yields such a positive result, at least 
initially, they remember in the future that the thing to do is to remove 
backup_label whenever they see the error.

If they only have pg_control, then their only choice is to get it right 
or run pg_resetwal. Most users have no knowledge of pg_resetwal so it 
will take them longer to get there. Also, I think that tool make it 
pretty clear that corruption will result and the only thing to do is a 
logical dump and restore after using it.

There are plenty of ways a user can mess things up. What I'd like to 
prevent is the appearance of everything being OK when in fact they have 
corrupted their cluster. That's the situation we have now with 
backup_label. Is this new solution perfect? No, but I do think it checks 
several boxes, and is a worthwhile improvement.

Regards,
-David

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

21 November 2023, 17:41:15

On 11/20/23 16:41, Andres Freund wrote:
> 
> On 2023-11-20 15:56:19 -0400, David Steele wrote:
>> I understand this is an option -- but does it need to be? What is the
>> benefit of excluding the manifest?
> 
> It's not free to create the manifest, particularly if checksums are enabled.

It's virtually free, even with the basic CRCs. Anyway, would you really 
want a backup without a manifest? How would you know something is 
missing? In particular, for page incremental how do you know something 
is new (but not WAL logged) if there is no manifest? Is the plan to just 
recopy anything not WAL logged with each incremental?

> Also, for external backups, there's no manifest...

There certainly is a manifest for many external backup solutions. Not 
having a manifest is just running with scissors, backup-wise.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Andres Freund

Date:

21 November 2023, 17:59:18

Hi,

On 2023-11-21 13:41:15 -0400, David Steele wrote:
> On 11/20/23 16:41, Andres Freund wrote:
> >
> > On 2023-11-20 15:56:19 -0400, David Steele wrote:
> > > I understand this is an option -- but does it need to be? What is the
> > > benefit of excluding the manifest?
> >
> > It's not free to create the manifest, particularly if checksums are enabled.
>
> It's virtually free, even with the basic CRCs.

Huh?

perf stat src/bin/pg_basebackup/pg_basebackup -h /tmp/ -p 5440 -D - -cfast -Xnone  --format=tar > /dev/null

          4,423.81 msec task-clock                       #    0.626 CPUs utilized
           433,475      context-switches                 #   97.987 K/sec
                 5      cpu-migrations                   #    1.130 /sec
               599      page-faults                      #  135.404 /sec
    12,208,261,153      cycles                           #    2.760 GHz
     6,805,401,520      instructions                     #    0.56  insn per cycle
     1,273,896,027      branches                         #  287.964 M/sec
        14,233,126      branch-misses                    #    1.12% of all branches

       7.068946385 seconds time elapsed

       1.106072000 seconds user
       3.403793000 seconds sys


perf stat src/bin/pg_basebackup/pg_basebackup -h /tmp/ -p 5440 -D - -cfast -Xnone  --format=tar
--manifest-checksums=CRC32C> /dev/null
 

          4,324.64 msec task-clock                       #    0.640 CPUs utilized
           433,306      context-switches                 #  100.195 K/sec
                 3      cpu-migrations                   #    0.694 /sec
               598      page-faults                      #  138.277 /sec
    11,952,475,908      cycles                           #    2.764 GHz
     6,816,888,845      instructions                     #    0.57  insn per cycle
     1,275,949,455      branches                         #  295.042 M/sec
        13,721,376      branch-misses                    #    1.08% of all branches

       6.760321433 seconds time elapsed

       1.113256000 seconds user
       3.302907000 seconds sys

perf stat src/bin/pg_basebackup/pg_basebackup -h /tmp/ -p 5440 -D - -cfast -Xnone  --format=tar --no-manifest >
/dev/null

          3,925.38 msec task-clock                       #    0.823 CPUs utilized
           257,467      context-switches                 #   65.590 K/sec
                 4      cpu-migrations                   #    1.019 /sec
               552      page-faults                      #  140.624 /sec
    11,577,054,842      cycles                           #    2.949 GHz
     5,933,731,797      instructions                     #    0.51  insn per cycle
     1,108,784,719      branches                         #  282.466 M/sec
        11,867,511      branch-misses                    #    1.07% of all branches

       4.770347012 seconds time elapsed

       1.002521000 seconds user
       2.991769000 seconds sys


I'd not call 7.06->4.77 or 6.76->4.77 "virtually free".


And this actually *under* selling the cost - we waste a lot of cycles due to
bad buffering decisions. Once we fix that, the cost differential increases
further.


> Anyway, would you really want a backup without a manifest? How would you
> know something is missing? In particular, for page incremental how do you
> know something is new (but not WAL logged) if there is no manifest? Is the
> plan to just recopy anything not WAL logged with each incremental?

Shrug. If you just want to create a new standby by copying the primary, I
don't think creating and then validating the manifest buys you much. Long term
backups are a different story, particularly if data files are stored
individually, rather than in a single checksummed file.


> > Also, for external backups, there's no manifest...
>
> There certainly is a manifest for many external backup solutions. Not having
> a manifest is just running with scissors, backup-wise.

You mean that you have an external solution gin up a backup manifest? I fail
to see how that's relevant here?

Greetings,

Andres Freund

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

21 November 2023, 18:13:58

On 11/20/23 16:37, Andres Freund wrote:
> 
> On 2023-11-20 11:11:13 -0500, Robert Haas wrote:
>> I think we need more votes to make a change this big. I have a
>> concern, which I think I've expressed before, that we keep whacking
>> around the backup APIs, and that has a cost which is potentially
>> larger than the benefits.
> 
> +1.  The amount of whacking around in this area has been substantial, and it's
> hard for operators to keep up. And realistically, with data sizes today, the
> pressure to do basebackups with disk snapshots etc is not going to shrink.

True enough, but disk snapshots aren't really backups in themselves, in 
most scenarios, because they reside on the same storage as the cluster. 
Of course, snapshots can be exported, but that's also expensive.

I see snapshots as an adjunct to backups -- a safe backup offsite 
somewhere for DR and snapshots for day to day operations. Even so, 
managing snapshots as backups is harder than people think. It is easy to 
get wrong and end up with silent corruption.

> Leaving that concern aside, I am still on the fence about this proposal. I
> think it does decrease the chance of getting things wrong in the
> streaming-basebackup case. But for external backups, it seems almost
> universally worse (with the exception of the torn pg_control issue, that we
> also can address otherwise):

Why universally worse? The software stores pg_control instead of backup 
label. The changes to pg_basebackup were pretty trivial and the changes 
to external backup are pretty much the same, at least in my limited 
sample of one.

And I don't believe we have a satisfactory solution to the torn 
pg_control issue yet. Certainly it has not been committed and Thomas has 
shown enthusiasm for this approach, to the point of hoping it could be 
back patched (it can't).

> It doesn't reduce the risk of getting things wrong, you can still omit placing
> a file into the data directory and get silent corruption as a consequence. In
> addition, it's harder to see when looking at a base backup whether the process
> was right or not, because now the good and bad state look the same if you just
> look on the filesystem level!

This is one of the reasons I thought writing just the first 512 bytes of 
pg_control would be valuable. It would give an easy indicator that 
pg_control came from a backup. Michael was not in favor of conflating 
that change with this patch -- but I still think it's a good idea.

> Then there's the issue of making ad-hoc debugging harder by not having a
> human readable file with information anymore, including when looking at the
> history, via backup_label.old.

Yeah, you'd need to use pg_controldata instead. But as Michael has 
suggested, we could also write backup_label as backup_info so there is 
human-readable information available.

> Given that, I wonder if what we should do is to just add a new field to
> pg_control that says "error out if backup_label does not exist", that we set
> when creating a streaming base backup

I'm not in favor of a change only accessible to pg_basebackup or 
external software that can manipulate pg_control.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

21 November 2023, 18:48:59

On 11/21/23 13:59, Andres Freund wrote:
> 
> On 2023-11-21 13:41:15 -0400, David Steele wrote:
>> On 11/20/23 16:41, Andres Freund wrote:
>>>
>>> On 2023-11-20 15:56:19 -0400, David Steele wrote:
>>>> I understand this is an option -- but does it need to be? What is the
>>>> benefit of excluding the manifest?
>>>
>>> It's not free to create the manifest, particularly if checksums are enabled.
>>
>> It's virtually free, even with the basic CRCs.
> 
> Huh?

<snip>

> I'd not call 7.06->4.77 or 6.76->4.77 "virtually free".

OK, but how does that look with compression -- to a remote location? 
Uncompressed backup to local storage doesn't seem very realistic. With 
gzip compression we measure SHA1 checksums at about 5% of total CPU. 
Obviously that goes up with zstd or lz4. but parallelism helps offset 
that cost, at least in clock time.

I can't understate how valuable checksums are in finding corruption, 
especially in long-lived backups.

>> Anyway, would you really want a backup without a manifest? How would you
>> know something is missing? In particular, for page incremental how do you
>> know something is new (but not WAL logged) if there is no manifest? Is the
>> plan to just recopy anything not WAL logged with each incremental?
> 
> Shrug. If you just want to create a new standby by copying the primary, I
> don't think creating and then validating the manifest buys you much. Long term
> backups are a different story, particularly if data files are stored
> individually, rather than in a single checksummed file.

Fine, but you are probably not using page incremental if just using 
pg_basebackup to create a standby. With page incremental, at least one 
of the backups will already exist, which argues for a manifest.

>>> Also, for external backups, there's no manifest...
>>
>> There certainly is a manifest for many external backup solutions. Not having
>> a manifest is just running with scissors, backup-wise.
> 
> You mean that you have an external solution gin up a backup manifest? I fail
> to see how that's relevant here?

Just saying that for external backups there *is* often a manifest and it 
is a good thing to have.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Andres Freund

Date:

21 November 2023, 20:00:18

Hi,

On 2023-11-21 14:48:59 -0400, David Steele wrote:
> > I'd not call 7.06->4.77 or 6.76->4.77 "virtually free".
> 
> OK, but how does that look with compression

With compression it's obviously somewhat different - but that part is done in
parallel, potentially on a different machine with client side compression,
whereas I think right now the checksumming is single-threaded, on the server
side.

With parallel server side compression, it's still 20% slower with the default
checksumming than none. With client side it's 15%.

> -- to a remote location?

I think this one unfortunately makes checksums a bigger issue, not a smaller
one. The network interaction piece is single-threaded, adding another
significant use of CPU onto the same thread means that you are hit harder by
using substantial amount of CPU for checksumming in the same thread.

Once you go beyond the small instances, you have plenty network bandwidth in
cloud environments. We top out well before the network on bigger instances.

> Uncompressed backup to local storage doesn't seem very realistic. With gzip
> compression we measure SHA1 checksums at about 5% of total CPU.

IMO using gzip is basically infeasible for non-toy sized databases today. I
think we're using our users a disservice by defaulting to it in a bunch of
places. Even if another default exposes them to difficulty due to potentially
using a different compiled binary with fewer supported compression methods -
that's gona be very rare in practice.

> I can't understate how valuable checksums are in finding corruption,
> especially in long-lived backups.

I agree!  But I think we need faster checksum algorithms or a faster
implementation of the existing ones. And probably default to something faster
once we have it.

Greetings,

Andres Freund

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

21 November 2023, 20:08:35

On 11/21/23 16:00, Andres Freund wrote:
> Hi,
> 
> On 2023-11-21 14:48:59 -0400, David Steele wrote:
>>> I'd not call 7.06->4.77 or 6.76->4.77 "virtually free".
>>
>> OK, but how does that look with compression
> 
> With compression it's obviously somewhat different - but that part is done in
> parallel, potentially on a different machine with client side compression,
> whereas I think right now the checksumming is single-threaded, on the server
> side.

Ah, yes, that's certainly a bottleneck.

> With parallel server side compression, it's still 20% slower with the default
> checksumming than none. With client side it's 15%.

Yeah, that still seems a lot. But to a large extent it sounds like a 
limitation of the current implementation.

>> -- to a remote location?
> 
> I think this one unfortunately makes checksums a bigger issue, not a smaller
> one. The network interaction piece is single-threaded, adding another
> significant use of CPU onto the same thread means that you are hit harder by
> using substantial amount of CPU for checksumming in the same thread.
> 
> Once you go beyond the small instances, you have plenty network bandwidth in
> cloud environments. We top out well before the network on bigger instances.
> 
>> Uncompressed backup to local storage doesn't seem very realistic. With gzip
>> compression we measure SHA1 checksums at about 5% of total CPU.
> 
> IMO using gzip is basically infeasible for non-toy sized databases today. I
> think we're using our users a disservice by defaulting to it in a bunch of
> places. Even if another default exposes them to difficulty due to potentially
> using a different compiled binary with fewer supported compression methods -
> that's gona be very rare in practice.

Yeah, I don't use gzip anymore, but there are still some platforms that 
do not provide zstd (at least not easily) and lz4 compresses less. One 
thing people do seem to have is a lot of cores.

>> I can't understate how valuable checksums are in finding corruption,
>> especially in long-lived backups.
> 
> I agree!  But I think we need faster checksum algorithms or a faster
> implementation of the existing ones. And probably default to something faster
> once we have it.

We've been using xxHash to generate checksums for our block-level 
incremental and it is seriously fast, written by the same guy who did 
zstd and lz4.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Stephen Frost

Date:

26 November 2023, 08:42:43

Greetings,

* David Steele (david@pgmasters.net) wrote:
> On 11/21/23 12:41, Andres Freund wrote:
> > Sure. They also receive a backup_label today. If an external solution forgets
> > to replace pg_control copied as part of the filesystem copy, they won't get an
> > error after the remove of backup_label, just like they don't get one today if
> > they don't put backup_label in the data directory.  Given that users don't do
> > the right thing with backup_label today, why can we rely on them doing the
> > right thing with pg_control?
>
> I think reliable backup software does the right thing with backup_label, but
> if the user starts getting errors on recovery they the decide to remove
> backup_label. I know we can't do much about bad backup software, but we can
> at least make this a bit more resistant to user error after the fact.
>
> It doesn't help that one of our hints suggests removing backup_label. In
> highly automated systems, the user might not even know they just restored
> from a backup. They are only in the loop because the restore failed and they
> are trying to figure out what is going wrong. When they remove backup_label
> the cluster comes up just fine. Victory!

Yup, this is exactly the issue.

> This is the scenario I've seen most often -- not the backup/restore process
> getting it wrong but the user removing backup_label on their own initiative.
> And because it yields such a positive result, at least initially, they
> remember in the future that the thing to do is to remove backup_label
> whenever they see the error.
>
> If they only have pg_control, then their only choice is to get it right or
> run pg_resetwal. Most users have no knowledge of pg_resetwal so it will take
> them longer to get there. Also, I think that tool make it pretty clear that
> corruption will result and the only thing to do is a logical dump and
> restore after using it.

Agreed.

> There are plenty of ways a user can mess things up. What I'd like to prevent
> is the appearance of everything being OK when in fact they have corrupted
> their cluster. That's the situation we have now with backup_label. Is this
> new solution perfect? No, but I do think it checks several boxes, and is a
> worthwhile improvement.

+1.

As for the complaint about 'operators' having issue with the changes
we've been making in this area- where are those people complaining,
exactly?  Who are they?  I feel like we keep getting this kind of
push-back in this area from folks on this list but not from actual
backup software authors; all the complaints seem to either be
speculative or unattributed pass-through from someone else.

What would really be helpful would be hearing from these individuals
directly as to what the issues are with the changes, such that perhaps
we can do things better in the future to avoid whatever the issue is
they're having with the changes.  Simply saying we shouldn't make
changes in this area isn't workable and the constant push-back is
actively discouraging to folks trying to make improvements.  Obviously
it's a biased view, but we've not had issues making the necessary
adjustments in pgbackrest with each release and I feel like if the
authors of wal-g or barman did that they would have spoken up.

Making a change as suggested which only helps pg_basebackup (and tools
like pgbackrest, since it's in C and can also make this particular
change) ends up leaving tools like wal-g and barman potentially still
with an easy way for users of those tools to corrupt their databases-
even though we've not heard anything from the authors of those tools
about issues with the proposed change, nor have there been a lot of
complaints from them about the prior changes to indicate that they'd
even have an issue with the more involved change.  Given the lack of
complaint about past changes, I'd certainly rather err on the side of
improved safety for users than on the side of the authors of these tools
possibly complaining.

What these changes have done is finally break things like omnipitr
completely, which hasn't been maintained in a very long time.  The
changes in v12 broke recovery with omnipitr but not backup, and folks
were trying to use omnipitr as recently as with v13[1].  Certainly
having a backup tool that only works for backup (fsvo works, anyway, as
it still used exclusive backup mode meaning that a crash during a backup
would cause the system to not come back up after...) but doesn't work
for recovery isn't exactly great and I'm glad that, now, an attempt to
use omnipitr to perform a backup will fail.  As with lots of other areas
of PG, folks need to read the release notes and potentially update their
code for new major versions.  If anything, the backup area is less of an
issue for this because the authors of the backup tools are able to make
the change (and who are often the ones pushing for these changes) and
the end-user isn't impacted at all.

Much the same can be said for wal-e, with users still trying to use it
even long after it was stated to be obsolete (the Obsolescence Notice[2]
was added in February 2022, though it hadn't been maintained for a while
before that, and an issue was opened in December 2022 asking for it to
be updated to v15[3]...).

Thanks,

Stephen

[1]: https://github.com/omniti-labs/omnipitr/issues/43
[2]: https://github.com/wal-e/wal-e/commit/f5b3e790fe10daa098b8cbf01d836c4885dc13c7
[3]: https://github.com/wal-e/wal-e/issues/433

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

Robert Haas

Date:

27 November 2023, 18:58:09

On Sun, Nov 26, 2023 at 3:42 AM Stephen Frost <sfrost@snowman.net> wrote:
> What would really be helpful would be hearing from these individuals
> directly as to what the issues are with the changes, such that perhaps
> we can do things better in the future to avoid whatever the issue is
> they're having with the changes.  Simply saying we shouldn't make
> changes in this area isn't workable and the constant push-back is
> actively discouraging to folks trying to make improvements.  Obviously
> it's a biased view, but we've not had issues making the necessary
> adjustments in pgbackrest with each release and I feel like if the
> authors of wal-g or barman did that they would have spoken up.

I'm happy if people show up to comment on proposed changes, but I
think you're being a little bit unrealistic here. I have had to help
plenty of people who have screwed up their backups in one way or
another, generally by using some home-grown script, sometimes by
misusing some existing backup tool. Those people are EDB customers;
they don't read and participate in discussions here. If they did,
perhaps they wouldn't be paying EDB to have me and my colleagues sort
things out for them when it all goes wrong. I'm not trying to say that
EDB doesn't have customers who participate in mailing list
discussions, because we do, but it's a small minority, and I don't
think that should surprise anyone. Moreover, the people who don't
wouldn't necessarily have the background, expertise, or *time* to
assess specific proposals in detail. If your point is that my
perspective on what's helpful or unhelpful is not valid because I've
only helped 30 people who had problems in this area, but that the
perspective of those 30 people who were helped would be more valid,
well, I don't agree with that. I think your perspective and David's
are valid precisely *because* you've worked a lot on pgbackrest and no
doubt interacted with lots of users; I think Andres's perspective is
valid precisely *because* of his experience working with the fleet at
Microsoft and individual customers at EDB and 2Q before that; and I
think my perspective is valid for the same kinds of reasons.

I am more in agreement with the idea that it would be nice to hear
from backup tool authors, but I think even that has limited value.
Surely we can all agree that if the backup tool is correctly written,
none of this matters, because you'll make the tool do the right things
and then you'll be fine. The difficulty here, and the motivation
behind this proposal and others like it, is that too many users fail
to follow the procedure correctly. If we hear from the authors of
well-written backup tools, I expect they will tell us they can adapt
their tool to whatever we do. And if we hear from the authors of
poorly-written tools, well, I don't think their opinions would form a
great basis for making decisions.

> [ lengthy discussion of tools that don't work any more ]

What confuses me here is that you seem to be arguing that we should
*once again* make a breaking change to the backup API, but at the same
time you're acknowledging that there are plenty of tools out there on
the Internet that have gotten broken by previous rounds of changes.
It's only one step from there to conclude that whacking the API around
does more harm than good, but you seem to reject that conclusion.

Personally, I haven't yet seen any evidence that the removal of
exclusive backup mode made any real difference one way or the other. I
think I've heard about people needing to adjust code for it, but not
about that being a problem. I have yet to run into anyone who was
previously using it but, because it was deprecated, switched to doing
something better and safer. Have you?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add recovery to pg_control and remove backup_label

From

Stephen Frost

Date:

28 November 2023, 15:42:50

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Sun, Nov 26, 2023 at 3:42 AM Stephen Frost <sfrost@snowman.net> wrote:
> > What would really be helpful would be hearing from these individuals
> > directly as to what the issues are with the changes, such that perhaps
> > we can do things better in the future to avoid whatever the issue is
> > they're having with the changes.  Simply saying we shouldn't make
> > changes in this area isn't workable and the constant push-back is
> > actively discouraging to folks trying to make improvements.  Obviously
> > it's a biased view, but we've not had issues making the necessary
> > adjustments in pgbackrest with each release and I feel like if the
> > authors of wal-g or barman did that they would have spoken up.
>
> I'm happy if people show up to comment on proposed changes, but I
> think you're being a little bit unrealistic here. I have had to help
> plenty of people who have screwed up their backups in one way or
> another, generally by using some home-grown script, sometimes by
> misusing some existing backup tool. Those people are EDB customers;
> they don't read and participate in discussions here. If they did,
> perhaps they wouldn't be paying EDB to have me and my colleagues sort
> things out for them when it all goes wrong. I'm not trying to say that
> EDB doesn't have customers who participate in mailing list
> discussions, because we do, but it's a small minority, and I don't
> think that should surprise anyone. Moreover, the people who don't
> wouldn't necessarily have the background, expertise, or *time* to
> assess specific proposals in detail. If your point is that my
> perspective on what's helpful or unhelpful is not valid because I've
> only helped 30 people who had problems in this area, but that the
> perspective of those 30 people who were helped would be more valid,
> well, I don't agree with that. I think your perspective and David's
> are valid precisely *because* you've worked a lot on pgbackrest and no
> doubt interacted with lots of users; I think Andres's perspective is
> valid precisely *because* of his experience working with the fleet at
> Microsoft and individual customers at EDB and 2Q before that; and I
> think my perspective is valid for the same kinds of reasons.

I didn't mean to imply that anyone's perspective wasn't valid.  I was
simply trying to get at the root question of: what *is* the issue with
the changes that are being made?  If the answer to that is: we made
this change, which was hard for folks to deal with, and could have
been avoided by doing X, then I really, really want to hear what X
was!  If the answer is, well, the changes weren't hard, but we didn't
like having to make any changes at all ... then I just don't have any
sympathy for that.  People who write backup software for PG, be it
pgbackrest authors, wal-g authors, or homegrown script authors, will
need to adapt between major versions as we discover things that are
broken (such as exclusive mode, and such as the clear risk that's been
demonstrated of a torn copy of pg_control getting copied, resulting in
a completely invalid backup) and fix them.

> I am more in agreement with the idea that it would be nice to hear
> from backup tool authors, but I think even that has limited value.
> Surely we can all agree that if the backup tool is correctly written,
> none of this matters, because you'll make the tool do the right things
> and then you'll be fine. The difficulty here, and the motivation
> behind this proposal and others like it, is that too many users fail
> to follow the procedure correctly. If we hear from the authors of
> well-written backup tools, I expect they will tell us they can adapt
> their tool to whatever we do. And if we hear from the authors of
> poorly-written tools, well, I don't think their opinions would form a
> great basis for making decisions.

Uhhh.  No, I disagree with this- I'd argue that pgbackrest was broken
until the most recently releases where we implemented a check to ensure
that the pg_control we copy has a valid PG CRC.  Did we know it was
broken before this discussion?  No, but that doesn't change the fact
that we certainly could have ended up copying an invalid pg_control and
thus have an invalid backup, which even our 'pgbackrest verify' wouldn't
have caught because that just checks that the checksum that pgbackrest
calculates for every file hasn't changed since we copied it- but that
didn't do anything for the issue about pg_control having an invalid
internal checksum due to a torn write when we copied it.

So, yes, it does matter.  We didn't make pgbackrest do the right thing
in this case because we thought it was true that you couldn't get a torn
read of pg_control; Thomas showed that wasn't true and that puts all of
our users at risk.  Thankfully somewhat minimal since we always copy
pg_control from the primary ... but still, it's not right, and we've
now taken steps to address it.  Unfortunately, other tools are going to
have a more difficult time because they're not written in C, but we
still care about them, and that's why we're pushing for this change- to
allow them to get a pretty much guaranteed valid pg_control from PG to
store without having to figure out how to validate it themselves.

> > [ lengthy discussion of tools that don't work any more ]
>
> What confuses me here is that you seem to be arguing that we should
> *once again* make a breaking change to the backup API, but at the same
> time you're acknowledging that there are plenty of tools out there on
> the Internet that have gotten broken by previous rounds of changes.

The broken ones aren't being maintained.  Yes, I'm happy to have those
explicitly and clearly broken.  I don't want people using outdated,
broken, and unmaintained tools to backup their PG databases.

> It's only one step from there to conclude that whacking the API around
> does more harm than good, but you seem to reject that conclusion.

We change the API because it objectively, clearly, addresses real issues
that users can run into that will cause them to have invalid backups if
left the way it is.  That backup software authors need to adjust to this
isn't a bad thing- it's a good thing, because we're fixing things and
they should be thrilled to have these issues addressed that they may not
have even considered.

> Personally, I haven't yet seen any evidence that the removal of
> exclusive backup mode made any real difference one way or the other. I
> think I've heard about people needing to adjust code for it, but not
> about that being a problem. I have yet to run into anyone who was
> previously using it but, because it was deprecated, switched to doing
> something better and safer. Have you?

I'm glad that people haven't had a problem adjusting their code to the
removal of exclusive backup mode, that's good, and leaves me, again, a
bit confused at what the issue here is about changing things- apparently
people don't actually have a problem with it, yet it keeps getting
raised as an issue every time we change things in this area.  I don't
understand that.

I'm not following the question entirely, I don't think.  Most backup
tool authors actively changed to using non-exclusive backup when
exclusive backup mode was deprecated, certainly pgbackrest did and we've
been using non-exclusive backup mode since it was available.  Are you
saying that, because everyone moved off of it, we should have kept it?
In that case the answer is clearly no- omnipitr, at the least, didn't
update to non-exclusive and therefore continued to run with the risk
that a crash during a backup would result in a cluster that wouldn't
start without manual intervention (an issue I've definitely heard about
a number of times, even recently) and that manual intervention (remove
the backup_label file) actively results in a *corrupt* cluster if the
user is actually restoring from a backup, which makes it really terrible
direction to give someone.  Here, use this hack- but only if you're 100%
coming back from a crash and absolutely never, ever, ever if you're
actually restoring from a backup.

Thanks!

Stephen

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

vignesh C

Date:

26 January 2024, 12:57:30

On Mon, 20 Nov 2023 at 06:46, Michael Paquier <michael@paquier.xyz> wrote:
>
> (I am not exactly sure how, but we've lost pgsql-hackers on the way
> when you sent v5.  Now added back in CC with the two latest patches
> you've proposed attached.)
>
> Here is a short summary of what has been missed by the lists:
> - I've commented that the patch should not create, not show up in
> fields returned the SQL functions or stream control files with a size
> of 512B, just stick to 8kB.  If this is worth changing this should be
> applied consistently across the board including initdb, discussed on
> its own thread.
> - The backup-related fields in the control file are reset at the end
> of recovery.  I've suggested to not do that to keep a trace of what
> was happening during recovery.  The latest version of the patch resets
> the fields.
> - With the backup_label file gone, we lose some information in the
> backups themselves, which is not good.  Instead, you have suggested an
> approach where this data is added to the backup manifest, meaning that
> no information would be lost, particularly useful for self-contained
> backups.  The fields planned to be added to the backup manifest are:
> -- The start and end time of the backup, the end timestamp being
> useful to know when stop time can be used for PITR.
> -- The backup label.
> I've agreed that it may be the best thing to do at this end to not
> lose any data related to the removal of the backup_label file.
>
> On Sun, Nov 19, 2023 at 02:14:32PM -0400, David Steele wrote:
> > On 11/15/23 20:03, Michael Paquier wrote:
> >> As the label is only an informational field, the parsing added to
> >> pg_verifybackup is not really needed because it is used nowhere in the
> >> validation process, so keeping the logic simpler would be the way to
> >> go IMO.  This is contrary to the WAL range for example, where start
> >> and end LSNs are used for validation with a pg_waldump command.
> >> Robert, any comments about the addition of the label in the manifest?
> >
> > I'm sure Robert will comment on this when he gets the time, but for now I
> > have backed off on passing the new info to pg_verifybackup and added
> > start/stop time.
>
> FWIW, I'm OK with the bits for the backup manifest as presented.  So
> if there are no remarks and/or no objections, I'd like to apply it but
> let give some room to others to comment on that as there's been a gap
> in the emails exchanged on pgsql-hackers.  I hope that the summary
> I've posted above covers everything.  So let's see about doing
> something around the middle of next week.  With Thanksgiving in the
> US, a lot of folks will not have the time to monitor what's happening
> on this thread.
>
> +      The end time for the backup. This is when the backup was stopped in
> +      <productname>PostgreSQL</productname> and represents the earliest time
> +      that can be used for time-based Point-In-Time Recovery.
>
> This one is actually a very good point.  We'd lost this capacity with
> the backup_label file gone without the end timestamps in the control
> file.
>
> > New patches attached based on b218fbb7.
>
> I've noticed on the other thread the remark about being less
> aggressive with the fields related to recovery in the control file, so
> I assume that this patch should leave the fields be after the end of
> recovery from the start and only rely on backupRecoveryRequired to
> decide if the recovery should use the fields or not:
> https://www.postgresql.org/message-id/241ccde1-1928-4ba2-a0bb-5350f7b191a8@=pgmasters.net
>
> +       ControlFile->backupCheckPoint = InvalidXLogRecPtr;
>         ControlFile->backupStartPoint = InvalidXLogRecPtr;
> +       ControlFile->backupStartPointTLI = 0;
>         ControlFile->backupEndPoint = InvalidXLogRecPtr;
> +       ControlFile->backupFromStandby = false;
>         ControlFile->backupEndRequired = false;
>
> Still, I get the temptation of being consistent with the current style
> on HEAD to reset everything, as well..

CFBot shows that the patch does not apply anymore as in [1]:

=== Applying patches on top of PostgreSQL commit ID
7014c9a4bba2d1b67d60687afb5b2091c1d07f73 ===
=== applying patch ./recovery-in-pgcontrol-v7-0001-add-info-to-manifest.patch
patching file doc/src/sgml/backup-manifest.sgml
patching file src/backend/backup/backup_manifest.c
patching file src/backend/backup/basebackup.c
Hunk #1 succeeded at 238 (offset 13 lines).
Hunk #2 succeeded at 258 (offset 13 lines).
Hunk #3 succeeded at 399 (offset 17 lines).
Hunk #4 succeeded at 652 (offset 17 lines).
can't find file to patch at input line 219
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/src/bin/pg_verifybackup/parse_manifest.c
b/src/bin/pg_verifybackup/parse_manifest.c
|index bf0227c668..408af88e58 100644
|--- a/src/bin/pg_verifybackup/parse_manifest.c
|+++ b/src/bin/pg_verifybackup/parse_manifest.c
--------------------------
No file to patch.  Skipping patch.
9 out of 9 hunks ignored
patching file src/include/backup/backup_manifest.h

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_3511.log

Regards,
Vignesh

Re: Add recovery to pg_control and remove backup_label

From

Michael Paquier

Date:

28 January 2024, 23:11:32

On Fri, Jan 26, 2024 at 06:27:30PM +0530, vignesh C wrote:
> Please post an updated version for the same.
>
> [1] - http://cfbot.cputube.org/patch_46_3511.log

With the recent introduction of incremental backups that depend on
backup_label and the rather negative feedback received, I think that
it would be better to return this entry as RwF for now.  What do you
think?
--
Michael

Attachment

signature.asc

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

28 January 2024, 23:28:41

On 1/28/24 19:11, Michael Paquier wrote:
> On Fri, Jan 26, 2024 at 06:27:30PM +0530, vignesh C wrote:
>> Please post an updated version for the same.
>>
>> [1] - http://cfbot.cputube.org/patch_46_3511.log
> 
> With the recent introduction of incremental backups that depend on
> backup_label and the rather negative feedback received, I think that
> it would be better to return this entry as RwF for now.  What do you
> think?

I've been thinking it makes little sense to update the patch. It would 
be a lot of work with all the new changes for incremental backup and 
since Andres and Robert appear to be very against the idea, I doubt it 
would be worth the effort.

I have withdrawn the patch.

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

10 March 2024, 03:47:26

On 1/29/24 12:28, David Steele wrote:
> On 1/28/24 19:11, Michael Paquier wrote:
>> On Fri, Jan 26, 2024 at 06:27:30PM +0530, vignesh C wrote:
>>> Please post an updated version for the same.
>>>
>>> [1] - http://cfbot.cputube.org/patch_46_3511.log
>>
>> With the recent introduction of incremental backups that depend on
>> backup_label and the rather negative feedback received, I think that
>> it would be better to return this entry as RwF for now.  What do you
>> think?
> 
> I've been thinking it makes little sense to update the patch. It would 
> be a lot of work with all the new changes for incremental backup and 
> since Andres and Robert appear to be very against the idea, I doubt it 
> would be worth the effort.

I've had a new idea which may revive this patch. The basic idea is to 
keep backup_label but also return a copy of pg_control from 
pg_stop_backup(). This copy of pg_control would be safe from tears and 
have a backupLabelRequired field set (as Andres suggested) so recovery 
cannot proceed without the backup label.

So, everything will continue to work as it does now. But, backup 
software can be enhanced to write the improved pg_control that is 
guaranteed not to be torn and has protection against a missing backup label.

Of course, pg_basebackup will write the new backupLabelRequired field 
into pg_control, but this way third party software can also gain 
advantages from the new field.

Thoughts?

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

From

Stefan Fercot

Date:

16 April 2024, 08:55:08

Hi,

On Sunday, March 10th, 2024 at 4:47 AM, David Steele wrote:
> I've had a new idea which may revive this patch. The basic idea is to
> keep backup_label but also return a copy of pg_control from
> pg_stop_backup(). This copy of pg_control would be safe from tears and
> have a backupLabelRequired field set (as Andres suggested) so recovery
> cannot proceed without the backup label.
>
> So, everything will continue to work as it does now. But, backup
> software can be enhanced to write the improved pg_control that is
> guaranteed not to be torn and has protection against a missing backup label.
>
> Of course, pg_basebackup will write the new backupLabelRequired field
> into pg_control, but this way third party software can also gain
> advantages from the new field.

Bump on this idea.

Given the discussion in [1], even if it obviously makes sense to improve the in core backup capabilities, the more we
goin that direction, the more we'll rely on outside orchestration. 
So IMHO it also worth worrying about given more leverage to such orchestration tools. In that sense, I really like the
ideato extend the backup functions. 

More thoughts?

Thanks all,
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

[1]
https://www.postgresql.org/message-id/lwXoqQdOT9Nw1tJIx_h7WuqMKrB1YMePQY99RFTZ87H7V52mgUJaSlw2WRbcOgKNUurF1yJqX3nqtZi4hJhtd3e_XlmLsLvnEtGXY-fZPoA%3D%40protonmail.com

Re: Add recovery to pg_control and remove backup_label

From

David Steele

Date:

17 May 2024, 02:53:53

On 4/16/24 18:55, Stefan Fercot wrote:
> Hi,
> 
> On Sunday, March 10th, 2024 at 4:47 AM, David Steele wrote:
>> I've had a new idea which may revive this patch. The basic idea is to
>> keep backup_label but also return a copy of pg_control from
>> pg_stop_backup(). This copy of pg_control would be safe from tears and
>> have a backupLabelRequired field set (as Andres suggested) so recovery
>> cannot proceed without the backup label.
>>
>> So, everything will continue to work as it does now. But, backup
>> software can be enhanced to write the improved pg_control that is
>> guaranteed not to be torn and has protection against a missing backup label.
>>
>> Of course, pg_basebackup will write the new backupLabelRequired field
>> into pg_control, but this way third party software can also gain
>> advantages from the new field.
> 
> Bump on this idea.
> 
> Given the discussion in [1], even if it obviously makes sense to improve the in core backup capabilities, the more we
goin that direction, the more we'll rely on outside orchestration.
 
> So IMHO it also worth worrying about given more leverage to such orchestration tools. In that sense, I really like
theidea to extend the backup functions.
 

I have implemented this idea and created a new thread [1] for it. 
Hopefully it will address the concerns expressed in this thread.

Regards,
-David

[1] 
https://www.postgresql.org/message-id/e2636c5d-c031-43c9-a5d6-5e5c7e4c5514%40pgmasters.net