Thread: Postgres restart in the middle of exclusive backup and the presence of backup_label file

Postgres restart in the middle of exclusive backup and the presence of backup_label file

From
SATYANARAYANA NARLAPURAM
Date:
Hi Hackers,

While an exclusive backup is in progress if Postgres restarts, postgres runs the recovery from the checkpoint identified by the label file instead of the control file. This can cause long recovery or even sometimes fail to recover as the WAL records corresponding to that checkpoint location are removed. I can write a layer in my control plane to remove the backup_label file when I know the server is not in restore from the base backup but I don't see a reason why everyone has to repeat this step. Am I missing something?

If there are no standby.signal or recovery.signal, what is the use case of honoring backup_label file? Even when they exist, for a long running recovery, should we honor the backup_label file as the majority of the WAL already applied? It does slow down the recovery on restart right as it has to start all the way from the beginning?

Thanks,
Satya
On Wed, Nov 24, 2021 at 02:12:19PM -0800, SATYANARAYANA NARLAPURAM wrote:
> While an exclusive backup is in progress if Postgres restarts, postgres
> runs the recovery from the checkpoint identified by the label file instead
> of the control file. This can cause long recovery or even sometimes fail to
> recover as the WAL records corresponding to that checkpoint location are
> removed. I can write a layer in my control plane to remove the backup_label
> file when I know the server is not in restore from the base backup but I
> don't see a reason why everyone has to repeat this step. Am I missing
> something?

This is a known issue with exclusive backups, which is a reason why
non-exclusive backups have been implemented.  pg_basebackup does that,
and using "false" as the third argument of pg_start_backup() would
have the same effect.  So I would recommend to switch to that.
--
Michael

Attachment

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

From
SATYANARAYANA NARLAPURAM
Date:
Thanks Michael!

This is a known issue with exclusive backups, which is a reason why
non-exclusive backups have been implemented.  pg_basebackup does that,
and using "false" as the third argument of pg_start_backup() would
have the same effect.  So I would recommend to switch to that.

Is there a plan in place to remove the exclusive backup option from the core in PG 15/16? If we are keeping it then why not make it better? 
On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
> Is there a plan in place to remove the exclusive backup option from the
> core in PG 15/16?

This was discussed, but removing it could also harm users relying on
it.  Perhaps it could be revisited, but I am not sure if this is worth
worrying about either.

> If we are keeping it then why not make it better?

Well, non-exclusive backups are better by design in many aspects, so I
don't quite see the point in spending time on something that has more
limitations than what's already in place.
--
Michael

Attachment
Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
>> If we are keeping it then why not make it better?

> Well, non-exclusive backups are better by design in many aspects, so I
> don't quite see the point in spending time on something that has more
> limitations than what's already in place.

IMO the main reason for keeping it is backwards compatibility for users
who have a satisfactory backup arrangement using it.  That same argument
implies that we shouldn't change how it works (at least, not very much).

            regards, tom lane



On 11/26/21, 7:33 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
>>> If we are keeping it then why not make it better?
>
>> Well, non-exclusive backups are better by design in many aspects, so I
>> don't quite see the point in spending time on something that has more
>> limitations than what's already in place.
>
> IMO the main reason for keeping it is backwards compatibility for users
> who have a satisfactory backup arrangement using it.  That same argument
> implies that we shouldn't change how it works (at least, not very much).

The issues with exclusive backups seem to be fairly well-documented
(e.g., c900c15), but perhaps there should also be a note in the
"Backup Control Functions" table [0].

Nathan

[0] https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-ADMIN-BACKUP


Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
> >> If we are keeping it then why not make it better?
>
> > Well, non-exclusive backups are better by design in many aspects, so I
> > don't quite see the point in spending time on something that has more
> > limitations than what's already in place.
>
> IMO the main reason for keeping it is backwards compatibility for users
> who have a satisfactory backup arrangement using it.  That same argument
> implies that we shouldn't change how it works (at least, not very much).

There isn't a satisfactory backup approach using it specifically because
of this issue, hence why we should remove it to make it so users don't
run into this.  Would also simplify the documentation around the low
level backup API, which would be a very good thing.  Right now, making
improvements in that area is very challenging even if all you want to do
is improve the documentation around the non-exclusive API.

We dealt with this as best as one could in pgbackrest for PG versions
prior to when non-exclusive backup was added- which is to remove the
backup_label file as soon as possible and then put it back right before
you call pg_stop_backup() (since it'll complain otherwise).  Not a
perfect answer though and a risk still exists there of a failed restart
happening.  Of course, for versions which support non-exclusive backup,
we use that to avoid this issue.

We also extensively changed how restore works a couple releases ago and
while there was some noise about it, it certainly wasn't all that bad.
I don't find the reasons brought up to continue to support exclusive
backup to be at all compelling and the lack of huge issues with the new
way restore works to make it abundently clear that we can, in fact,
remove exclusive backup in a major version change without the world
coming down.

Thanks,

Stephen

Attachment
On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote:
> Greetings,
> 
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Michael Paquier <michael@paquier.xyz> writes:
> > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
> > > > If we are keeping it then why not make it better?
> > 
> > > Well, non-exclusive backups are better by design in many aspects, so I
> > > don't quite see the point in spending time on something that has more
> > > limitations than what's already in place.
> > 
> > IMO the main reason for keeping it is backwards compatibility for users
> > who have a satisfactory backup arrangement using it.  That same argument
> > implies that we shouldn't change how it works (at least, not very much).
> 
> There isn't a satisfactory backup approach using it specifically because
> of this issue, hence why we should remove it to make it so users don't
> run into this.

There is a satisfactory approach, as long as you are satisfied with
manually restarting the server if it crashed during a backup.

> I don't find the reasons brought up to continue to support exclusive
> backup to be at all compelling and the lack of huge issues with the new
> way restore works to make it abundently clear that we can, in fact,
> remove exclusive backup in a major version change without the world
> coming down.

I guess the lack of hue and cry was at least to a certain extent because
the exclusive backup API was deprecated, but not removed.

Yours,
Laurenz Albe




Greetings,

On Tue, Nov 30, 2021 at 11:47 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote:
> Greetings,
>
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Michael Paquier <michael@paquier.xyz> writes:
> > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
> > > > If we are keeping it then why not make it better?
> >
> > > Well, non-exclusive backups are better by design in many aspects, so I
> > > don't quite see the point in spending time on something that has more
> > > limitations than what's already in place.
> >
> > IMO the main reason for keeping it is backwards compatibility for users
> > who have a satisfactory backup arrangement using it.  That same argument
> > implies that we shouldn't change how it works (at least, not very much).
>
> There isn't a satisfactory backup approach using it specifically because
> of this issue, hence why we should remove it to make it so users don't
> run into this.

There is a satisfactory approach, as long as you are satisfied with
manually restarting the server if it crashed during a backup.

I disagree that that’s a satisfactory approach. It certainly wasn’t intended or documented as part of the original feature and therefore to call it satisfactory strikes me quite strongly as revisionist history. 

> I don't find the reasons brought up to continue to support exclusive
> backup to be at all compelling and the lack of huge issues with the new
> way restore works to make it abundently clear that we can, in fact,
> remove exclusive backup in a major version change without the world
> coming down.

I guess the lack of hue and cry was at least to a certain extent because
the exclusive backup API was deprecated, but not removed.

These comments were in reference to the restore API, which was quite changed (new special files that have to be touched, removing of recovery.conf, options moved to postgresql.conf/.auto, etc). So, no. 

Thanks,

Stephen
On 11/30/21, 9:51 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> I disagree that that’s a satisfactory approach. It certainly wasn’t
> intended or documented as part of the original feature and therefore
> to call it satisfactory strikes me quite strongly as revisionist
> history. 

It looks like the exclusive way has been marked deprecated in all
supported versions along with a note that it will eventually be
removed.  If it's not going to be removed out of fear of breaking
backward compatibility, I think the documentation should be updated to
say that.  However, unless there is something that is preventing users
from switching to the non-exclusive approach, I think it is reasonable
to begin thinking about removing it.

Nathan


"Bossart, Nathan" <bossartn@amazon.com> writes:
> It looks like the exclusive way has been marked deprecated in all
> supported versions along with a note that it will eventually be
> removed.  If it's not going to be removed out of fear of breaking
> backward compatibility, I think the documentation should be updated to
> say that.  However, unless there is something that is preventing users
> from switching to the non-exclusive approach, I think it is reasonable
> to begin thinking about removing it.

If we're willing to outright remove it, I don't have any great objection.
My original two cents was that we shouldn't put effort into improving it;
but removing it isn't that.

            regards, tom lane



On 11/30/21, 2:27 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> If we're willing to outright remove it, I don't have any great objection.
> My original two cents was that we shouldn't put effort into improving it;
> but removing it isn't that.

I might try to put a patch together for the January commitfest, given
there is enough support.

Nathan


On 11/30/21 17:26, Tom Lane wrote:
> "Bossart, Nathan" <bossartn@amazon.com> writes:
>> It looks like the exclusive way has been marked deprecated in all
>> supported versions along with a note that it will eventually be
>> removed.  If it's not going to be removed out of fear of breaking
>> backward compatibility, I think the documentation should be updated to
>> say that.  However, unless there is something that is preventing users
>> from switching to the non-exclusive approach, I think it is reasonable
>> to begin thinking about removing it.
> 
> If we're willing to outright remove it, I don't have any great objection.
> My original two cents was that we shouldn't put effort into improving it;
> but removing it isn't that.

The main objections as I recall are that it is much harder for simple 
backup scripts and commercial backup integrations to hold a connection 
to postgres open and write the backup label separately into the backup.

As Stephen noted, working in this area is much harder (even in the docs) 
due to the need to keep both methods working. When I removed exclusive 
backup it didn't break any tests, other than one that needed to generate 
a corrupt backup, so we have virtually no coverage for that method.

I did figure out how to keep the safe part of exclusive backup (not 
having to maintain a connection) while removing the dangerous part 
(writing backup_label into PGDATA), but it was a substantial amount of 
work and I felt that it had little chance of being committed.

Attaching the thread [1] that I started with a patch to remove exclusive 
backup for reference.

--

[1] 
https://www.postgresql.org/message-id/flat/ac7339ca-3718-3c93-929f-99e725d1172c%40pgmasters.net



On 11/30/21, 2:58 PM, "David Steele" <david@pgmasters.net> wrote:
> I did figure out how to keep the safe part of exclusive backup (not
> having to maintain a connection) while removing the dangerous part
> (writing backup_label into PGDATA), but it was a substantial amount of
> work and I felt that it had little chance of being committed.

Do you think it's still worth trying to make it safe, or do you think
we should just remove exclusive mode completely?

> Attaching the thread [1] that I started with a patch to remove exclusive
> backup for reference.

Ah, good, some light reading.  :)

Nathan


On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
> The main objections as I recall are that it is much harder for simple backup
> scripts and commercial backup integrations to hold a connection to postgres
> open and write the backup label separately into the backup.

I don't quite understand why this argument would not hold even today,
even if I'd like to think that more people are using pg_basebackup.

> I did figure out how to keep the safe part of exclusive backup (not having
> to maintain a connection) while removing the dangerous part (writing
> backup_label into PGDATA), but it was a substantial amount of work and I
> felt that it had little chance of being committed.

Which was, I guess, done by storing the backup_label contents within a
file different than backup_label, still maintained in the main data
folder to ensure that it gets included in the backup?
--
Michael

Attachment

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

From
SATYANARAYANA NARLAPURAM
Date:

On Tue, Nov 30, 2021 at 4:54 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
> The main objections as I recall are that it is much harder for simple backup
> scripts and commercial backup integrations to hold a connection to postgres
> open and write the backup label separately into the backup.

I don't quite understand why this argument would not hold even today,
even if I'd like to think that more people are using pg_basebackup.

> I did figure out how to keep the safe part of exclusive backup (not having
> to maintain a connection) while removing the dangerous part (writing
> backup_label into PGDATA), but it was a substantial amount of work and I
> felt that it had little chance of being committed.

Which was, I guess, done by storing the backup_label contents within a
file different than backup_label, still maintained in the main data
folder to ensure that it gets included in the backup?

Non-exclusive backup has significant advantages over exclusive backups but would like to add a few comments on the simplicity of exclusive backups -
1/ It is not uncommon nowadays to take a snapshot based backup. Exclusive backup simplifies this story as the backup label file is part of the snapshot. Otherwise, one needs to store it somewhere outside as snapshot metadata and copy this file over during restore (after creating a disk from the snapshot) to the data directory. Typical steps included are 1/ start pg_base_backup 2/ Take disk snapshot 3/ pg_stop_backup() 4/ Mark snapshot as consistent and add some create time metadata.
2/ Control plane code responsible for taking backups is simpler with exclusive backups than non-exclusive as it doesn't maintain a connection to the server, particularly when that orchestration is outside the machine the Postgres server is running on.

IMHO, we should either remove the support for it or improve it but not leave it hanging there.


On 11/30/21 19:54, Michael Paquier wrote:
> On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
>> I did figure out how to keep the safe part of exclusive backup (not having
>> to maintain a connection) while removing the dangerous part (writing
>> backup_label into PGDATA), but it was a substantial amount of work and I
>> felt that it had little chance of being committed.
> 
> Which was, I guess, done by storing the backup_label contents within a
> file different than backup_label, still maintained in the main data
> folder to ensure that it gets included in the backup?

That, or emit it from pg_start_backup() so the user can write it 
wherever they please. That would include writing it into PGDATA if they 
really wanted to, but that would be on them and the default behavior 
would be safe. The problem with this is if the user does not 
rename/supply backup_label on restore then they will get corruption and 
not know it.

Here's another idea. Since the contents of pg_wal are not supposed to be 
copied, we could add a file there to indicate that the cluster should 
remove backup_label on restart. Our instructions also say to remove the 
contents of pg_wal on restore if they were originally copied, so 
hopefully one of the two would happen. But, again, if they fail to 
follow the directions it would lead to corruption.

Order would be important here. When starting the backup the proper order 
would be to write pg_wal/backup_in_progress and then backup_label. When 
stopping the backup they would be removed in the reverse order.

On a restart if both are present then delete both in the correct order 
and start crash recovery using the info in pg_control. If only 
backup_label is present then go into recovery using the info from 
backup_label.

It's possible for pg_wal/backup_in_process to be present by itself if 
the server crashes after deleting backup_label but before deleting 
pg_wal/backup_in_progress. In that case the server should simply remove 
it on start and go into crash recovery using the info from pg_control.

The advantage of this idea is that it does not change the current 
instructions as far as I can see. If the user is already following them, 
they'll be fine. If they are not, then they'll need to start doing so.

Of course, none of this affects users who are using non-exclusive 
backup, which I do hope covers the majority by now.

Thoughts?

Regards,
-David



On 11/30/21 17:26, Tom Lane wrote:
> "Bossart, Nathan" <bossartn@amazon.com> writes:
>> It looks like the exclusive way has been marked deprecated in all
>> supported versions along with a note that it will eventually be
>> removed.  If it's not going to be removed out of fear of breaking
>> backward compatibility, I think the documentation should be updated to
>> say that.  However, unless there is something that is preventing users
>> from switching to the non-exclusive approach, I think it is reasonable
>> to begin thinking about removing it.
> If we're willing to outright remove it, I don't have any great objection.
> My original two cents was that we shouldn't put effort into improving it;
> but removing it isn't that.
>
>             


+1


Let's just remove it. We already know it's a footgun, and there's been
plenty of warning.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




On 11/30/21 18:31, Bossart, Nathan wrote:
> On 11/30/21, 2:58 PM, "David Steele" <david@pgmasters.net> wrote:
>> I did figure out how to keep the safe part of exclusive backup (not
>> having to maintain a connection) while removing the dangerous part
>> (writing backup_label into PGDATA), but it was a substantial amount of
>> work and I felt that it had little chance of being committed.
> 
> Do you think it's still worth trying to make it safe, or do you think
> we should just remove exclusive mode completely?

My preference would be to remove it completely, but I haven't gotten a 
lot of traction so far.

>> Attaching the thread [1] that I started with a patch to remove exclusive
>> backup for reference.
> 
> Ah, good, some light reading.  :)

Sure, if you say so!

Regards,
-David



On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote:
> On 11/30/21 18:31, Bossart, Nathan wrote:
>> Do you think it's still worth trying to make it safe, or do you think
>> we should just remove exclusive mode completely?
>
> My preference would be to remove it completely, but I haven't gotten a
> lot of traction so far.

In this thread, I count 6 people who seem alright with removing it,
and 2 who might be opposed, although I don't think anyone has
explicitly stated they are against it.

Nathan


On 12/1/21, 10:37 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote:
>> On 11/30/21 18:31, Bossart, Nathan wrote:
>>> Do you think it's still worth trying to make it safe, or do you think
>>> we should just remove exclusive mode completely?
>>
>> My preference would be to remove it completely, but I haven't gotten a
>> lot of traction so far.
>
> In this thread, I count 6 people who seem alright with removing it,
> and 2 who might be opposed, although I don't think anyone has
> explicitly stated they are against it.

I hastily rebased the patch from 2018 and got it building and passing
the tests.  I'm sure it will need additional changes, but I'll wait
for more feedback before I expend too much more effort on this.

Nathan


Attachment
On 12/1/21 19:30, Bossart, Nathan wrote:
> On 12/1/21, 10:37 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
>> On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote:
>>> On 11/30/21 18:31, Bossart, Nathan wrote:
>>>> Do you think it's still worth trying to make it safe, or do you think
>>>> we should just remove exclusive mode completely?
>>> My preference would be to remove it completely, but I haven't gotten a
>>> lot of traction so far.
>> In this thread, I count 6 people who seem alright with removing it,
>> and 2 who might be opposed, although I don't think anyone has
>> explicitly stated they are against it.
> I hastily rebased the patch from 2018 and got it building and passing
> the tests.  I'm sure it will need additional changes, but I'll wait
> for more feedback before I expend too much more effort on this.
>


Should we really be getting rid of
PostgreSQL::Test::Cluster::backup_fs_hot() ?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




On 12/2/21 11:00, Andrew Dunstan wrote:
> 
> On 12/1/21 19:30, Bossart, Nathan wrote:
>> On 12/1/21, 10:37 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
>>> On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote:
>>>> On 11/30/21 18:31, Bossart, Nathan wrote:
>>>>> Do you think it's still worth trying to make it safe, or do you think
>>>>> we should just remove exclusive mode completely?
>>>> My preference would be to remove it completely, but I haven't gotten a
>>>> lot of traction so far.
>>> In this thread, I count 6 people who seem alright with removing it,
>>> and 2 who might be opposed, although I don't think anyone has
>>> explicitly stated they are against it.
>> I hastily rebased the patch from 2018 and got it building and passing
>> the tests.  I'm sure it will need additional changes, but I'll wait
>> for more feedback before I expend too much more effort on this.
>>
> 
> Should we really be getting rid of
> PostgreSQL::Test::Cluster::backup_fs_hot() ?

Agreed, it would be better to update backup_fs_hot() to use exclusive 
mode and save out backup_label instead.

Regards,
-David



On 12/2/21 12:38, David Steele wrote:
> On 12/2/21 11:00, Andrew Dunstan wrote:
>>
>> Should we really be getting rid of
>> PostgreSQL::Test::Cluster::backup_fs_hot() ?
> 
> Agreed, it would be better to update backup_fs_hot() to use exclusive 
> mode and save out backup_label instead.

Oops, of course I meant non-exclusive mode.

Regards,
-David



On 12/2/21, 9:50 AM, "David Steele" <david@pgmasters.net> wrote:
> On 12/2/21 12:38, David Steele wrote:
>> On 12/2/21 11:00, Andrew Dunstan wrote:
>>>
>>> Should we really be getting rid of
>>> PostgreSQL::Test::Cluster::backup_fs_hot() ?
>>
>> Agreed, it would be better to update backup_fs_hot() to use exclusive
>> mode and save out backup_label instead.
>
> Oops, of course I meant non-exclusive mode.

+1.  I'll fix that in the next revision.

Nathan


On 12/2/21, 1:34 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 12/2/21, 9:50 AM, "David Steele" <david@pgmasters.net> wrote:
>> On 12/2/21 12:38, David Steele wrote:
>>> On 12/2/21 11:00, Andrew Dunstan wrote:
>>>>
>>>> Should we really be getting rid of
>>>> PostgreSQL::Test::Cluster::backup_fs_hot() ?
>>>
>>> Agreed, it would be better to update backup_fs_hot() to use exclusive
>>> mode and save out backup_label instead.
>>
>> Oops, of course I meant non-exclusive mode.
>
> +1.  I'll fix that in the next revision.

I finally got around to looking into this, and I think I found why it
was done this way in 2018.  backup_fs_hot() runs pg_start_backup(),
closes the session, copies the data, and then runs pg_stop_backup() in
a different session.  This doesn't work with non-exclusive mode
because the backup will be aborted when the session that runs
pg_start_backup() is closed.  pg_stop_backup() will fail with a
"backup is not in progress" error.  Furthermore,
010_logical_decoding_timelines.pl seems to be the only test that uses
backup_fs_hot().

After a quick glance, I didn't see an easy way to hold a session open
while the test does other things.  If there isn't one, modifying
backup_fs_hot() to work with non-exclusive mode might be more trouble
than it is worth.

Nathan


On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote:
After a quick glance, I didn't see an easy way to hold a session open
while the test does other things.  If there isn't one, modifying
backup_fs_hot() to work with non-exclusive mode might be more trouble
than it is worth.
You can use IPC::Run to start psql in background. See examples in
src/test/recovery.


--
Euler Taveira

On 1/6/22 20:20, Euler Taveira wrote:
> On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote:
>> After a quick glance, I didn't see an easy way to hold a session open
>> while the test does other things.  If there isn't one, modifying
>> backup_fs_hot() to work with non-exclusive mode might be more trouble
>> than it is worth.
 >
> You can use IPC::Run to start psql in background. See examples in
> src/test/recovery.

I don't think updating backup_fs_hot() is worth it here.

backup_fs_cold() works just fine for this case and if there is a need 
for backup_fs_hot() in the future it can be implemented as needed.

Regards,
-David



On 1/7/22, 5:52 AM, "David Steele" <david@pgmasters.net> wrote:
> On 1/6/22 20:20, Euler Taveira wrote:
>> On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote:
>>> After a quick glance, I didn't see an easy way to hold a session open
>>> while the test does other things.  If there isn't one, modifying
>>> backup_fs_hot() to work with non-exclusive mode might be more trouble
>>> than it is worth.
>>
>> You can use IPC::Run to start psql in background. See examples in
>> src/test/recovery.
>
> I don't think updating backup_fs_hot() is worth it here.
>
> backup_fs_cold() works just fine for this case and if there is a need
> for backup_fs_hot() in the future it can be implemented as needed.

Thanks for the pointer on IPC::Run.  I had a feeling I was missing
something obvious!

I think I agree with David that it still isn't worth it for just this
one test.  Of course, it would be great to test the non-exclusive
backup logic as much as possible, but I'm not sure that this
particular test will provide any sort of meaningful coverage.

Nathan


Here is a rebased patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
Some review comments on the latest version:

+    * runningBackups is a counter indicating the number of backups currently in
+    * progress. forcePageWrites is set to true when either of these is
+    * non-zero. lastBackupStart is the latest checkpoint redo location used as
+    * a starting point for an online backup.
     */
-   ExclusiveBackupState exclusiveBackupState;
-   int         nonExclusiveBackups;

What do you mean by "either of these is non-zero ''. Earlier we used
to set forcePageWrites in case of both exclusive and non-exclusive
backups, but we have just one type of backup now.

==

-    * OK to update backup counters, forcePageWrites and session-level lock.
+    * OK to update backup counters and forcePageWrites.
     *

We still update the status of session-level lock so I don't think we
should update the above comment. See below code:

    if (XLogCtl->Insert.runningBackups == 0)
    {
        XLogCtl->Insert.forcePageWrites = false;
    }

    /*
     * Clean up session-level lock.
     *
     * You might think that WALInsertLockRelease() can be called before
     * cleaning up session-level lock because session-level lock doesn't need
     * to be protected with WAL insertion lock. But since
     * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be
     * cleaned up before it.
     */
    sessionBackupState = SESSION_BACKUP_NONE;

    WALInsertLockRelease();

==

@@ -8993,18 +8686,16 @@ do_pg_abort_backup(int code, Datum arg)
    bool        emit_warning = DatumGetBool(arg);

    /*
-    * Quick exit if session is not keeping around a non-exclusive backup
-    * already started.
+    * Quick exit if session does not have a running backup.
     */
-   if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
+   if (sessionBackupState != SESSION_BACKUP_RUNNING)
        return;

    WALInsertLockAcquireExclusive();
-   Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-   XLogCtl->Insert.nonExclusiveBackups--;
+   Assert(XLogCtl->Insert.runningBackups > 0);
+   XLogCtl->Insert.runningBackups--;

-   if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
-       XLogCtl->Insert.nonExclusiveBackups == 0)
+   if (XLogCtl->Insert.runningBackups == 0)
    {
        XLogCtl->Insert.forcePageWrites = false;
    }

I think we have a lot of common code in do_pg_abort_backup() and
pg_do_stop_backup(). So why not have a common function that can be
called from both these functions.

==

+# Now delete the bogus backup_label file since it will interfere with startup
+unlink("$pgdata/backup_label")
+  or BAIL_OUT("unable to unlink $pgdata/backup_label");
+

Why do we need this additional change? Earlier this was not required.

--
With Regards,
Ashutosh Sharma.

On Thu, Feb 17, 2022 at 6:41 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Here is a rebased patch.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com



On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote:
> Some review comments on the latest version:

Thanks for the feedback!  Before I start spending more time on this one, I
should probably ask if this has any chance of making it into v15.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Sat, Feb 19, 2022 at 2:24 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote:
> > Some review comments on the latest version:
>
> Thanks for the feedback!  Before I start spending more time on this one, I
> should probably ask if this has any chance of making it into v15.

I don't see any reason why it can't make it to v15. However, it is not
super urgent as the users facing this problem have a choice. They can
use non-exclusive mode.

--
With Regards,
Ashutosh Sharma.



I've attached an updated patch.

On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote:
> +    * runningBackups is a counter indicating the number of backups currently in
> +    * progress. forcePageWrites is set to true when either of these is
> +    * non-zero. lastBackupStart is the latest checkpoint redo location used as
> +    * a starting point for an online backup.
>      */
> -   ExclusiveBackupState exclusiveBackupState;
> -   int         nonExclusiveBackups;
> 
> What do you mean by "either of these is non-zero ''. Earlier we used
> to set forcePageWrites in case of both exclusive and non-exclusive
> backups, but we have just one type of backup now.

Fixed this.

> -    * OK to update backup counters, forcePageWrites and session-level lock.
> +    * OK to update backup counters and forcePageWrites.
>      *
> 
> We still update the status of session-level lock so I don't think we
> should update the above comment. See below code:

Fixed this.

> I think we have a lot of common code in do_pg_abort_backup() and
> pg_do_stop_backup(). So why not have a common function that can be
> called from both these functions.

I didn't follow through with this change.  I only saw a handful of lines
that looked similar, and AFAICT we'd need an extra branch for cleaning up
the session-level lock since do_pg_abort_backup() doesn't.

> +# Now delete the bogus backup_label file since it will interfere with startup
> +unlink("$pgdata/backup_label")
> +  or BAIL_OUT("unable to unlink $pgdata/backup_label");
> +
> 
> Why do we need this additional change? Earlier this was not required.

IIUC this test relied on the following code to handle the bogus file:

            /*
             * Terminate exclusive backup mode to avoid recovery after a clean
             * fast shutdown.  Since an exclusive backup can only be taken
             * during normal running (and not, for example, while running
             * under Hot Standby) it only makes sense to do this if we reached
             * normal running. If we're still in recovery, the backup file is
             * one we're recovering *from*, and we must keep it around so that
             * recovery restarts from the right place.
             */
            if (ReachedNormalRunning)
                CancelBackup();

The attached patch removes this code.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

This patch applies cleanly for me and passes installcheck-world.
I have not yet studied all of the changes in detail.

Some proofreading nits in the documentation: the pg_stop_backup
with arguments has lost the 'exclusive' argument, but still shows a comma
before the 'wait_for_archive' argument. And the niladic pg_stop_backup
is still documented, though it no longer exists.

My biggest concerns are the changes to the SQL-visible pg_start_backup
and pg_stop_backup functions. When the non-exclusive API was implemented
(in 7117685), that was done with care (with a new optional argument to
pg_start_backup, and a new overload of pg_stop_backup) to avoid immediate
breakage of working backup scripts.

With this patch, even scripts that were dutifully migrated to that new API and
now invoke pg_start_backup(label, false) or (label, exclusive => false) will
immediately and unnecessarily break. What I would suggest for this patch
would be to change the exclusive default from true to false, and have the
function report an ERROR if true is passed.

Otherwise, for sites using a third-party backup solution, there will be an
unnecessary requirement to synchronize a PostgreSQL upgrade with an
upgrade of the backup solution that won't be broken by the change. For
a site with their backup procedures scripted in-house, there will be an
unnecessarily urgent need for the current admin team to study and patch
the currently-working scripts.

That can be avoided by just changing the default to false and rejecting calls
where true is passed. That will break only scripts that never got the memo
about moving to non-exclusive backup, available for six years now.

Assuming the value is false, so no error is thrown, is it practical to determine
from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would
further suggest reporting a deprecation WARNING if it was explicitly supplied,
with a HINT that the argument can simply be removed at the call site, and will
become unrecognized in some future release.

pg_stop_backup needs thought, because 7117685 added a new overload for that
function, rather than just an optional argument. This patch removes the old
niladic version that returned pg_lsn, leaving just one version, with an optional
argument, that returns a record.

Here again, the old niladic one was only suitable for exclusive backups, so there
can't be any script existing in 2022 that still calls that unless it has never been
updated in six years to nonexclusive backups, and that breakage can't be
helped.

Any scripts that did get dutifully updated over the last six years will be calling the
record-returning version, passing false, or exclusive => false. This patch as it
stands will unnecessarily break those, but here again I think that can be avoided
just by making the exclusive parameter optional with default false, and reporting
an error if true is passed.

Here again, I would consider also issuing a deprecation warning if the argument
is explicitly supplied, if it is practical to determine that from fn_expr. (I haven't
looked yet to see how practical that is.)

Regards,
-Chap
On 02/26/22 11:48, Chapman Flack wrote:
> This patch applies cleanly for me and passes installcheck-world.
> I have not yet studied all of the changes in detail.

I've now looked through the rest, and the only further thing I noticed
was that xlog.c's do_pg_start_backup still has a tablespaces parameter
to receive a List* of tablespaces if the caller wants, but this patch
removes the comment describing it:


- * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs
- * describing the cluster's tablespaces.


which seems like collateral damage.

Regards,
-Chap



On Sat, Feb 26, 2022 at 04:48:52PM +0000, Chapman Flack wrote:
> My biggest concerns are the changes to the SQL-visible pg_start_backup
> and pg_stop_backup functions. When the non-exclusive API was implemented
> (in 7117685), that was done with care (with a new optional argument to
> pg_start_backup, and a new overload of pg_stop_backup) to avoid immediate
> breakage of working backup scripts.
> 
> With this patch, even scripts that were dutifully migrated to that new API and
> now invoke pg_start_backup(label, false) or (label, exclusive => false) will
> immediately and unnecessarily break. What I would suggest for this patch
> would be to change the exclusive default from true to false, and have the
> function report an ERROR if true is passed.
> 
> Otherwise, for sites using a third-party backup solution, there will be an
> unnecessary requirement to synchronize a PostgreSQL upgrade with an
> upgrade of the backup solution that won't be broken by the change. For
> a site with their backup procedures scripted in-house, there will be an
> unnecessarily urgent need for the current admin team to study and patch
> the currently-working scripts.
> 
> That can be avoided by just changing the default to false and rejecting calls
> where true is passed. That will break only scripts that never got the memo
> about moving to non-exclusive backup, available for six years now.
> 
> Assuming the value is false, so no error is thrown, is it practical to determine
> from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would
> further suggest reporting a deprecation WARNING if it was explicitly supplied,
> with a HINT that the argument can simply be removed at the call site, and will
> become unrecognized in some future release.

This is a good point.  I think I agree with your proposed changes.  I
believe it is possible to add a deprecation warning only when 'exclusive'
is specified.  If anything, we can create a separate function that accepts
the 'exclusive' parameter and that always emits a NOTICE or WARNING.

> pg_stop_backup needs thought, because 7117685 added a new overload for that
> function, rather than just an optional argument. This patch removes the old
> niladic version that returned pg_lsn, leaving just one version, with an optional
> argument, that returns a record.
> 
> Here again, the old niladic one was only suitable for exclusive backups, so there
> can't be any script existing in 2022 that still calls that unless it has never been
> updated in six years to nonexclusive backups, and that breakage can't be
> helped.
> 
> Any scripts that did get dutifully updated over the last six years will be calling the
> record-returning version, passing false, or exclusive => false. This patch as it
> stands will unnecessarily break those, but here again I think that can be avoided
> just by making the exclusive parameter optional with default false, and reporting
> an error if true is passed.
> 
> Here again, I would consider also issuing a deprecation warning if the argument
> is explicitly supplied, if it is practical to determine that from fn_expr. (I haven't
> looked yet to see how practical that is.)

Agreed.  I will look into updating this one, too.  I think the 'exclusive'
parameter should remain documented for now for both pg_start_backup() and
pg_stop_backup(), but this documentation will just note that it is there
for backward compatibility and must be set to false.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Sat, Feb 26, 2022 at 05:03:04PM -0500, Chapman Flack wrote:
> I've now looked through the rest, and the only further thing I noticed
> was that xlog.c's do_pg_start_backup still has a tablespaces parameter
> to receive a List* of tablespaces if the caller wants, but this patch
> removes the comment describing it:
> 
> 
> - * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs
> - * describing the cluster's tablespaces.
> 
> 
> which seems like collateral damage.

Thanks.  I will fix this and the proofreading nits you noted upthread in
the next revision.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Sat, Feb 26, 2022 at 02:06:14PM -0800, Nathan Bossart wrote:
> On Sat, Feb 26, 2022 at 04:48:52PM +0000, Chapman Flack wrote:
>> Assuming the value is false, so no error is thrown, is it practical to determine
>> from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would
>> further suggest reporting a deprecation WARNING if it was explicitly supplied,
>> with a HINT that the argument can simply be removed at the call site, and will
>> become unrecognized in some future release.
> 
> This is a good point.  I think I agree with your proposed changes.  I
> believe it is possible to add a deprecation warning only when 'exclusive'
> is specified.  If anything, we can create a separate function that accepts
> the 'exclusive' parameter and that always emits a NOTICE or WARNING.

I've spent some time looking into this, and I haven't found a clean way to
emit a WARNING only if the "exclusive" parameter is supplied (and set to
false).  AFAICT flinfo->fn_expr doesn't tell us whether the parameter was
supplied or the default value was used.  I was able to get it working by
splitting pg_start_backup() into 3 separate internal functions (i.e.,
pg_start_backup_1arg(), pg_start_backup_2arg(), and
pg_start_backup_3arg()), but this breaks calls such as
pg_start_backup('mylabel', exclusive => false), and it might complicate
privilege management for users.

Without a WARNING, I think it will be difficult to justify removing the
"exclusive" parameter in the future.  We would either need to leave it
around forever, or we would have to risk unnecessarily breaking some
working backup scripts.  I wonder if we should just remove it now and make
sure that this change is well-documented in the release notes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On 2/28/22 23:51, Nathan Bossart wrote:
> On Sat, Feb 26, 2022 at 02:06:14PM -0800, Nathan Bossart wrote:
>> On Sat, Feb 26, 2022 at 04:48:52PM +0000, Chapman Flack wrote:
>>> Assuming the value is false, so no error is thrown, is it practical to determine
>>> from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would
>>> further suggest reporting a deprecation WARNING if it was explicitly supplied,
>>> with a HINT that the argument can simply be removed at the call site, and will
>>> become unrecognized in some future release.
>>
>> This is a good point.  I think I agree with your proposed changes.  I
>> believe it is possible to add a deprecation warning only when 'exclusive'
>> is specified.  If anything, we can create a separate function that accepts
>> the 'exclusive' parameter and that always emits a NOTICE or WARNING.
> 
> I've spent some time looking into this, and I haven't found a clean way to
> emit a WARNING only if the "exclusive" parameter is supplied (and set to
> false).  AFAICT flinfo->fn_expr doesn't tell us whether the parameter was
> supplied or the default value was used.  I was able to get it working by
> splitting pg_start_backup() into 3 separate internal functions (i.e.,
> pg_start_backup_1arg(), pg_start_backup_2arg(), and
> pg_start_backup_3arg()), but this breaks calls such as
> pg_start_backup('mylabel', exclusive => false), and it might complicate
> privilege management for users.
> 
> Without a WARNING, I think it will be difficult to justify removing the
> "exclusive" parameter in the future.  We would either need to leave it
> around forever, or we would have to risk unnecessarily breaking some
> working backup scripts.  I wonder if we should just remove it now and make
> sure that this change is well-documented in the release notes.

Personally, I am in favor of removing it. We change/rename 
functions/tables/views when we need to, and this happens in almost every 
release.

What we need to do is make sure that an older installation won't 
silently work in a broken way, i.e. if we remove the exclusive flag 
somebody expecting the pre-9.6 behavior might not receive an error and 
think everything is OK. That would not be good.

One option might be to rename the functions. Something like 
pg_backup_start/stop.

Regards,
-David



On Tue, Mar 01, 2022 at 08:44:51AM -0600, David Steele wrote:
> Personally, I am in favor of removing it. We change/rename
> functions/tables/views when we need to, and this happens in almost every
> release.
> 
> What we need to do is make sure that an older installation won't silently
> work in a broken way, i.e. if we remove the exclusive flag somebody
> expecting the pre-9.6 behavior might not receive an error and think
> everything is OK. That would not be good.
> 
> One option might be to rename the functions. Something like
> pg_backup_start/stop.

I'm fine with this approach.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On 03/01/22 09:44, David Steele wrote:
> Personally, I am in favor of removing it. We change/rename
> functions/tables/views when we need to, and this happens in almost every
> release.

For clarification, is that a suggestion to remove the 'exclusive' parameter
in some later release, after using this release to default it to false and
reject calls with true?

I can get behind that proposal, even if we don't have a practical way
to add the warning I suggested. I'd be happier with the warning, but can
live without it. Release notes can be the warning.

That way, at least, there would be a period of time where procedures
that currently work (by passing exclusive => false) would continue to work,
and could be adapted as time permits by removing that argument, with no
behavioral change.

The later release removing the argument would then break only procedures
that had never done so. That's comparable to what's proposed for this
release, which will only break procedures that have never migrated away
from exclusive mode despite the time and notice to do so.

That seems ok to me.

Regards,
-Chap



Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:
> On Tue, Mar 01, 2022 at 08:44:51AM -0600, David Steele wrote:
> > Personally, I am in favor of removing it. We change/rename
> > functions/tables/views when we need to, and this happens in almost every
> > release.
> >
> > What we need to do is make sure that an older installation won't silently
> > work in a broken way, i.e. if we remove the exclusive flag somebody
> > expecting the pre-9.6 behavior might not receive an error and think
> > everything is OK. That would not be good.
> >
> > One option might be to rename the functions. Something like
> > pg_backup_start/stop.
>
> I'm fine with this approach.

+1.

Thanks,

Stephen

Attachment
On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote:
> On 03/01/22 09:44, David Steele wrote:
>> Personally, I am in favor of removing it. We change/rename
>> functions/tables/views when we need to, and this happens in almost every
>> release.
> 
> For clarification, is that a suggestion to remove the 'exclusive' parameter
> in some later release, after using this release to default it to false and
> reject calls with true?

My suggestion was to remove it in v15.  My impression is that David and
Stephen agree, but I could be misinterpreting their responses.

> That way, at least, there would be a period of time where procedures
> that currently work (by passing exclusive => false) would continue to work,
> and could be adapted as time permits by removing that argument, with no
> behavioral change.

I'm not sure if there's any advantage to kicking the can down the road.  At
some point, we'll need to break existing backup scripts.  Will we be more
prepared to do that in v17 than we are now?  We could maintain two sets of
functions for a few releases and make it really clear in the documentation
that pg_start/stop_backup() are going to be removed soon (and always emit a
WARNING when they are used).  Would that address your concerns?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On 3/1/22 11:32, Nathan Bossart wrote:
> On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote:
>> On 03/01/22 09:44, David Steele wrote:
>>> Personally, I am in favor of removing it. We change/rename
>>> functions/tables/views when we need to, and this happens in almost every
>>> release.
>>
>> For clarification, is that a suggestion to remove the 'exclusive' parameter
>> in some later release, after using this release to default it to false and
>> reject calls with true?
> 
> My suggestion was to remove it in v15.  My impression is that David and
> Stephen agree, but I could be misinterpreting their responses.

I agree and I'm pretty sure Stephen does as well.

>> That way, at least, there would be a period of time where procedures
>> that currently work (by passing exclusive => false) would continue to work,
>> and could be adapted as time permits by removing that argument, with no
>> behavioral change.
> 
> I'm not sure if there's any advantage to kicking the can down the road.  At
> some point, we'll need to break existing backup scripts.  Will we be more
> prepared to do that in v17 than we are now?  We could maintain two sets of
> functions for a few releases and make it really clear in the documentation
> that pg_start/stop_backup() are going to be removed soon (and always emit a
> WARNING when they are used).  Would that address your concerns?

I think people are going to complain no matter what. If scripts are 
being maintained changing the name is not a big deal (though moving from 
exclusive to non-exclusive may be). If they aren't being maintained then 
they'll just blow up a few versions down the road when we remove the 
compatibility functions.

Regards,
-David



On 03/01/22 12:32, Nathan Bossart wrote:
> On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote:
>> That way, at least, there would be a period of time where procedures
>> that currently work (by passing exclusive => false) would continue to work,
>> and could be adapted as time permits by removing that argument, with no
>> behavioral change.
> 
> I'm not sure if there's any advantage to kicking the can down the road.  At
> some point, we'll need to break existing backup scripts.  Will we be more
> prepared to do that in v17 than we are now?

Yes, if we have provided a transition period the way we did in 7117685.
The way the on-ramp to that transition worked:

- Initially, your procedures used exclusive mode.
- If you changed nothing, they continued to work, with no behavior change.
- You then had ample time to adapt them to non-exclusive mode so now
  they work that way.
- You knew a day would come (here it comes) where, if you've never
  gotten around to doing that, your unchanged exclusive-mode procedures
  are going to break.

So here we are, arrived at that day. If you're still using exclusive mode,
your stuff's going to break; serves you right. So now limit the cases we
care about to people who made use of the time they were given to change
their procedures to use exclusive => false. So the off-ramp can look like:

- Initially, your procedures pass exclusive => false.
- If you change nothing, they should continue to work, with no
  behavior change.
- You should then have ample time to change to a new spelling without
  exclusive => false, and have them work that way.
- You know some later day is coming where, if you've never
  gotten around to doing that, they're going to break.

Then, when that day comes, if you're still passing exclusive at all,
your stuff's going to break; serves you right. If you have made use of
the time you were given for the changes, you'll be fine. So yes, at that
point, I think we can do it with clear conscience. We'll have made the
off-ramp as smooth and navigable as the on-ramp was.

> We could maintain two sets of
> functions for a few releases and make it really clear in the documentation
> that pg_start/stop_backup() are going to be removed soon (and always emit a
> WARNING when they are used).  Would that address your concerns?

That would. I had been thinking of not changing the names, and just making
the parameter go away. But I wasn't thinking of your concern here:

> What we need to do is make sure that an older installation won't silently
> work in a broken way, i.e. if we remove the exclusive flag somebody
> expecting the pre-9.6 behavior might not receive an error and think
> everything is OK. That would not be good.

So I'm ok with changing the names. Then step 3 of the off-ramp
would just be to call the functions by the new names, as well as to drop
the exclusive => false.

The thing I'd want to avoid is just, after the trouble that was taken
to make the on-ramp navigable, making the off-ramp be a cliff.

Regards,
-Chap



On 03/01/22 13:22, David Steele wrote:
> I think people are going to complain no matter what. If scripts are being
> maintained changing the name is not a big deal (though moving from exclusive
> to non-exclusive may be). If they aren't being maintained then they'll just
> blow up a few versions down the road when we remove the compatibility
> functions.

I might have already said enough in the message that crossed with this,
but I think what I'm saying is there's a less-binary distinction between
scripts that are/aren't "being maintained".

There can't really be many teams out there thinking "we'll just ignore
these scripts forever, and nothing bad will happen." They all know they'll
have to do stuff sometimes. But it matters how we allow them to schedule it.

In the on-ramp, at first there was only exclusive. Then there were both
modes, with exclusive being deprecated, so teams knew they'd need to do
stuff, and most by now probably have. They were able to do separation of
hazards and schedule that work; they did not have to pile it onto the
whole plate of "upgrade PG from 9.5 to 9.6 and make sure everything works".

So now we're dropping the other shoe: first there was one mode, then both,
now there's only the other one. Again there's some work for teams to do;
let's again allow them to separate hazards and schedule that work apart
from the whole 14 to 15 upgrade project.

We can't help getting complaints in the off-ramp from anybody who ignored
the on-ramp. But we can avoid clobbering the teams who dutifully played
along before, and only want the same space to do so now.

Regards,
-Chap



Greetings,

* David Steele (david@pgmasters.net) wrote:
> On 3/1/22 11:32, Nathan Bossart wrote:
> >On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote:
> >>On 03/01/22 09:44, David Steele wrote:
> >>>Personally, I am in favor of removing it. We change/rename
> >>>functions/tables/views when we need to, and this happens in almost every
> >>>release.
> >>
> >>For clarification, is that a suggestion to remove the 'exclusive' parameter
> >>in some later release, after using this release to default it to false and
> >>reject calls with true?
> >
> >My suggestion was to remove it in v15.  My impression is that David and
> >Stephen agree, but I could be misinterpreting their responses.
>
> I agree and I'm pretty sure Stephen does as well.

Yes, +1 to removing it.

> >>That way, at least, there would be a period of time where procedures
> >>that currently work (by passing exclusive => false) would continue to work,
> >>and could be adapted as time permits by removing that argument, with no
> >>behavioral change.
> >
> >I'm not sure if there's any advantage to kicking the can down the road.  At
> >some point, we'll need to break existing backup scripts.  Will we be more
> >prepared to do that in v17 than we are now?  We could maintain two sets of
> >functions for a few releases and make it really clear in the documentation
> >that pg_start/stop_backup() are going to be removed soon (and always emit a
> >WARNING when they are used).  Would that address your concerns?
>
> I think people are going to complain no matter what. If scripts are being
> maintained changing the name is not a big deal (though moving from exclusive
> to non-exclusive may be). If they aren't being maintained then they'll just
> blow up a few versions down the road when we remove the compatibility
> functions.

I don't consider "maintained" and "still using the exclusive backup
method" to both be able to be true at the same time.

Thanks,

Stephen

Attachment
Greetings,

* Chapman Flack (chap@anastigmatix.net) wrote:
> On 03/01/22 13:22, David Steele wrote:
> > I think people are going to complain no matter what. If scripts are being
> > maintained changing the name is not a big deal (though moving from exclusive
> > to non-exclusive may be). If they aren't being maintained then they'll just
> > blow up a few versions down the road when we remove the compatibility
> > functions.
>
> I might have already said enough in the message that crossed with this,
> but I think what I'm saying is there's a less-binary distinction between
> scripts that are/aren't "being maintained".
>
> There can't really be many teams out there thinking "we'll just ignore
> these scripts forever, and nothing bad will happen." They all know they'll
> have to do stuff sometimes. But it matters how we allow them to schedule it.

We only make these changes between major versions.  That's as much as we
should be required to provide.

Further, we seriously changed around how restores work a few versions
back and there was rather little complaining.

Thanks,

Stephen

Attachment
On 03/01/22 14:14, Stephen Frost wrote:
>> There can't really be many teams out there thinking "we'll just ignore
>> these scripts forever, and nothing bad will happen." They all know they'll
>> have to do stuff sometimes. But it matters how we allow them to schedule it.
> 
> We only make these changes between major versions.  That's as much as we
> should be required to provide.

It's an OSS project, so I guess we're not required to provide anything.

But in the course of this multi-release exclusive to non-exclusive
transition, we already demonstrated, in 7117685, that we can avoid
inflicting immediate breakage when there's nothing in our objective
that inherently requires it, and avoiding it is relatively easy.

I can't bring myself to think that was a bad precedent.

Now, granted, the difference between the adaptations being required then
and the ones required now is that those required both: changes to some
function calls, and corresponding changes to how the scripts handled
label and tablespace files. Here, it's only a clerical update to some
function calls.

So if I'm outvoted here and the reason is "look, a lighter burden is
involved this time than that time", then ok. I would rather bow to that
argument on the specific facts of one case than abandon the precedent
from 7117685 generally.

Regards,
-Chap



Greetings,

* Chapman Flack (chap@anastigmatix.net) wrote:
> On 03/01/22 14:14, Stephen Frost wrote:
> >> There can't really be many teams out there thinking "we'll just ignore
> >> these scripts forever, and nothing bad will happen." They all know they'll
> >> have to do stuff sometimes. But it matters how we allow them to schedule it.
> >
> > We only make these changes between major versions.  That's as much as we
> > should be required to provide.
>
> It's an OSS project, so I guess we're not required to provide anything.
>
> But in the course of this multi-release exclusive to non-exclusive
> transition, we already demonstrated, in 7117685, that we can avoid
> inflicting immediate breakage when there's nothing in our objective
> that inherently requires it, and avoiding it is relatively easy.
>
> I can't bring myself to think that was a bad precedent.

It's actively bad because we are ridiculously inconsistent when it comes
to these things and we're terrible about ever removing anything once
it's gotten into the tree as 'deprecated'.  Witness that it's 8 years
since 7117685 and we still have these old and clearly broken APIs
around.  We absolutely need to move *away* from this approach, exactly
how 2dedf4d9, much more recently than 7117685, for all of its other
flaws, did.

> So if I'm outvoted here and the reason is "look, a lighter burden is
> involved this time than that time", then ok. I would rather bow to that
> argument on the specific facts of one case than abandon the precedent
> from 7117685 generally.

It's far from precedent- if anything, it's quite the opposite from how
most changes around here are made, and much more recent commits in the
same area clearly tossed out entirely the idea of trying to maintain
some kind of backwards compatibility with existing scripts.

Thanks,

Stephen

Attachment
Here is a new version of the patch with the following changes:

    1. Addressed Chap's feedback from upthread.
    2. Renamed pg_start/stop_backup() to pg_backup_start/stop() as
       suggested by David.
    3. A couple of other small documentation adjustments.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On 03/01/22 20:03, Nathan Bossart wrote:
> Here is a new version of the patch with the following changes:

I did not notice this earlier (sorry), but there seems to remain in
backup.sgml a programlisting example that shows a psql invocation
for pg_backup_start, then a tar command, then another psql invocation
for pg_backup_stop.

I think that was only workable for the exclusive mode, and now it is
necessary to issue pg_backup_start and pg_backup_stop in the same session.

(The 'touch backup_in_progress' business seems a bit bogus now too,
suggesting an exclusivity remembered from bygone days.)

I am not sure what a workable, simple example ought to look like.
Maybe a single psql script issuing the pg_backup_start and the
pg_backup_stop, with a tar command in between with \! ?

Several bricks shy of production-ready, but it would give the idea.

Regards,
-Chap



On Wed, Mar 02, 2022 at 02:23:51PM -0500, Chapman Flack wrote:
> I did not notice this earlier (sorry), but there seems to remain in
> backup.sgml a programlisting example that shows a psql invocation
> for pg_backup_start, then a tar command, then another psql invocation
> for pg_backup_stop.
> 
> I think that was only workable for the exclusive mode, and now it is
> necessary to issue pg_backup_start and pg_backup_stop in the same session.
> 
> (The 'touch backup_in_progress' business seems a bit bogus now too,
> suggesting an exclusivity remembered from bygone days.)
> 
> I am not sure what a workable, simple example ought to look like.
> Maybe a single psql script issuing the pg_backup_start and the
> pg_backup_stop, with a tar command in between with \! ?
> 
> Several bricks shy of production-ready, but it would give the idea.

Another option might be to just remove this section.  The top of the
section mentions that this is easily done using pg_basebackup with the -X
parameter.  The bottom part of the section includes more complicated steps
for when "more flexibility in copying the backup files is needed..."
AFAICT the more complicated strategy was around before pg_basebackup, and
the pg_basebackup recommendation was added in 2012 as part of 920febd.
Thoughts?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On 3/8/22 14:01, Nathan Bossart wrote:
> On Wed, Mar 02, 2022 at 02:23:51PM -0500, Chapman Flack wrote:
>> I did not notice this earlier (sorry), but there seems to remain in
>> backup.sgml a programlisting example that shows a psql invocation
>> for pg_backup_start, then a tar command, then another psql invocation
>> for pg_backup_stop.
>>
>> I think that was only workable for the exclusive mode, and now it is
>> necessary to issue pg_backup_start and pg_backup_stop in the same session.
>>
>> (The 'touch backup_in_progress' business seems a bit bogus now too,
>> suggesting an exclusivity remembered from bygone days.)
>>
>> I am not sure what a workable, simple example ought to look like.
>> Maybe a single psql script issuing the pg_backup_start and the
>> pg_backup_stop, with a tar command in between with \! ?
>>
>> Several bricks shy of production-ready, but it would give the idea.
> 
> Another option might be to just remove this section.  The top of the
> section mentions that this is easily done using pg_basebackup with the -X
> parameter.  The bottom part of the section includes more complicated steps
> for when "more flexibility in copying the backup files is needed..."
> AFAICT the more complicated strategy was around before pg_basebackup, and
> the pg_basebackup recommendation was added in 2012 as part of 920febd.
> Thoughts?

This makes sense to me. I think pg_basebackup is far preferable to doing 
anything like what is described in this section. Unless you are planning 
to do something fancy (parallelization, snapshots, object stores, etc.) 
then pg_basebackup is the way to go.

Regards,
-David



On Tue, Mar 08, 2022 at 03:09:50PM -0600, David Steele wrote:
> On 3/8/22 14:01, Nathan Bossart wrote:
>> On Wed, Mar 02, 2022 at 02:23:51PM -0500, Chapman Flack wrote:
>> > I did not notice this earlier (sorry), but there seems to remain in
>> > backup.sgml a programlisting example that shows a psql invocation
>> > for pg_backup_start, then a tar command, then another psql invocation
>> > for pg_backup_stop.
>> > 
>> > I think that was only workable for the exclusive mode, and now it is
>> > necessary to issue pg_backup_start and pg_backup_stop in the same session.
>> > 
>> > (The 'touch backup_in_progress' business seems a bit bogus now too,
>> > suggesting an exclusivity remembered from bygone days.)
>> > 
>> > I am not sure what a workable, simple example ought to look like.
>> > Maybe a single psql script issuing the pg_backup_start and the
>> > pg_backup_stop, with a tar command in between with \! ?
>> > 
>> > Several bricks shy of production-ready, but it would give the idea.
>> 
>> Another option might be to just remove this section.  The top of the
>> section mentions that this is easily done using pg_basebackup with the -X
>> parameter.  The bottom part of the section includes more complicated steps
>> for when "more flexibility in copying the backup files is needed..."
>> AFAICT the more complicated strategy was around before pg_basebackup, and
>> the pg_basebackup recommendation was added in 2012 as part of 920febd.
>> Thoughts?
> 
> This makes sense to me. I think pg_basebackup is far preferable to doing
> anything like what is described in this section. Unless you are planning to
> do something fancy (parallelization, snapshots, object stores, etc.) then
> pg_basebackup is the way to go.

I spent some time trying to come up with a workable script to replace the
existing one.  I think the main problem is that you need to write out both
the backup label file and the tablespace map file, but I didn't find an
easy way to write the different output columns of pg_backup_stop() to
separate files via psql.  We'd probably need to write out the steps in
prose like the 'Making a Base Backup Using the Low Level API' section does.
Ultimately, I just removed everything beyond the pg_basebackup
recommendation in the 'Standalone Hot Backups' section.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On 03/08/22 17:12, Nathan Bossart wrote:
> I spent some time trying to come up with a workable script to replace the
> existing one.  I think the main problem is that you need to write out both
> the backup label file and the tablespace map file, but I didn't find an
> easy way to write the different output columns of pg_backup_stop() to
> separate files via psql.

Something like this might work:

SELECT * FROM pg_backup_stop(true) \gset

\out /tmp/backup_label   \qecho :labelfile
\out /tmp/tablespace_map \qecho :spcmapfile
\out
\! ... tar command adding /tmp/{backup_label,tablespace_map} to the tarball

I notice the \qecho adds a final newline (and so if :spcmapfile is empty,
a file containing a single newline is made). In a quick test with a bogus
restore_command, I did not see any error messages specific to the format
of the backup_label or tablespace_map files, so maybe the final newline
isn't a problem.

Assuming the newline isn't a problem, that might be simple enough to
use in an example, and maybe it's not a bad thing that it highlights a few
psql capabilities the reader might not have stumbled on before. Or, maybe
it is just too confusing to bother.

While agreeing that pg_basebackup is the production-ready thing that
does it all for you (with tests for likely errors and so on), I think
there is also some value in a dead-simple example that concretely
shows you what "it" is, what the basic steps are that happen beneath
pg_basebackup's chrome.

If the added newline is a problem, I haven't thought of a way to exclude
it that doesn't take the example out of the realm of dead-simple.

Regards,
-Chap



Greetings,

* Chapman Flack (chap@anastigmatix.net) wrote:
> On 03/08/22 17:12, Nathan Bossart wrote:
> > I spent some time trying to come up with a workable script to replace the
> > existing one.  I think the main problem is that you need to write out both
> > the backup label file and the tablespace map file, but I didn't find an
> > easy way to write the different output columns of pg_backup_stop() to
> > separate files via psql.

Let's not confuse ourselves here- the existing script *doesn't* work in
any reasonable way when we're talking about everything that needs to be
done to perform a backup.  That a lot of people are using it because
it's in the documentation is an actively bad thing.

The same goes for the archive command example.

> Something like this might work:
>
> SELECT * FROM pg_backup_stop(true) \gset
>
> \out /tmp/backup_label   \qecho :labelfile
> \out /tmp/tablespace_map \qecho :spcmapfile
> \out
> \! ... tar command adding /tmp/{backup_label,tablespace_map} to the tarball

... this doesn't do what's needed either.  We could try to write down
some minimum set of things that are needed for a backup tool to do but
it's not something that a 3 line script is going to cover.  Indeed, it's
a lot more like pg_basebackup and if we want to give folks a script to
use, it should be "run pg_basebackup".

> I notice the \qecho adds a final newline (and so if :spcmapfile is empty,
> a file containing a single newline is made). In a quick test with a bogus
> restore_command, I did not see any error messages specific to the format
> of the backup_label or tablespace_map files, so maybe the final newline
> isn't a problem.
>
> Assuming the newline isn't a problem, that might be simple enough to
> use in an example, and maybe it's not a bad thing that it highlights a few
> psql capabilities the reader might not have stumbled on before. Or, maybe
> it is just too confusing to bother.

It's more than just too confusing, it's actively bad because people will
actually use it and then end up with backups that don't work.

> While agreeing that pg_basebackup is the production-ready thing that
> does it all for you (with tests for likely errors and so on), I think
> there is also some value in a dead-simple example that concretely
> shows you what "it" is, what the basic steps are that happen beneath
> pg_basebackup's chrome.

Documenting everything that pg_basebackup does to make sure that the
backup is viable might be something to work on if someone is really
excited about this, but it's not 'dead-simple' and it's darn close to
the bare minimum, something that none of these simple scripts will come
anywhere close to being and instead they'll be far less than the
minimum.

> If the added newline is a problem, I haven't thought of a way to exclude
> it that doesn't take the example out of the realm of dead-simple.

I disagree that there's really a way to provide 'dead-simple' backups
with what's built into core without using pg_basebackup.  If we want a
'dead-simple' solution in core then we'd need to write an appropriate
backup tool that does all the basic things and include and maintain
that.  Writing a shell script isn't enough and we shouldn't encourage
our users to do exactly that by having it in our documentation because
then they'll think it's enough.

Thanks,

Stephen

Attachment
On Wed, Mar 9, 2022 at 4:42 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Chapman Flack (chap@anastigmatix.net) wrote:
> > On 03/08/22 17:12, Nathan Bossart wrote:
> > > I spent some time trying to come up with a workable script to replace the
> > > existing one.  I think the main problem is that you need to write out both
> > > the backup label file and the tablespace map file, but I didn't find an
> > > easy way to write the different output columns of pg_backup_stop() to
> > > separate files via psql.
>
> Let's not confuse ourselves here- the existing script *doesn't* work in
> any reasonable way when we're talking about everything that needs to be
> done to perform a backup.  That a lot of people are using it because
> it's in the documentation is an actively bad thing.
>
> The same goes for the archive command example.
>
> > Something like this might work:
> >
> > SELECT * FROM pg_backup_stop(true) \gset
> >
> > \out /tmp/backup_label   \qecho :labelfile
> > \out /tmp/tablespace_map \qecho :spcmapfile
> > \out
> > \! ... tar command adding /tmp/{backup_label,tablespace_map} to the tarball
>
> ... this doesn't do what's needed either.  We could try to write down
> some minimum set of things that are needed for a backup tool to do but
> it's not something that a 3 line script is going to cover.  Indeed, it's
> a lot more like pg_basebackup and if we want to give folks a script to
> use, it should be "run pg_basebackup".
>
> > I notice the \qecho adds a final newline (and so if :spcmapfile is empty,
> > a file containing a single newline is made). In a quick test with a bogus
> > restore_command, I did not see any error messages specific to the format
> > of the backup_label or tablespace_map files, so maybe the final newline
> > isn't a problem.
> >
> > Assuming the newline isn't a problem, that might be simple enough to
> > use in an example, and maybe it's not a bad thing that it highlights a few
> > psql capabilities the reader might not have stumbled on before. Or, maybe
> > it is just too confusing to bother.
>
> It's more than just too confusing, it's actively bad because people will
> actually use it and then end up with backups that don't work.

+1.

Or even worse, backups that sometimes work, but not reliably and not every time.

> > While agreeing that pg_basebackup is the production-ready thing that
> > does it all for you (with tests for likely errors and so on), I think
> > there is also some value in a dead-simple example that concretely
> > shows you what "it" is, what the basic steps are that happen beneath
> > pg_basebackup's chrome.

I agree that having a dead simple script would be good.

The *only* dead simple script that's going to be possible is one that
calls pg_basebackup.

The current APIs don't make it *possible* to drive them directly with
a dead simple script.

Pretending something is simple when it's not, is not doing anybody a favor.


> Documenting everything that pg_basebackup does to make sure that the
> backup is viable might be something to work on if someone is really
> excited about this, but it's not 'dead-simple' and it's darn close to
> the bare minimum, something that none of these simple scripts will come
> anywhere close to being and instead they'll be far less than the
> minimum.

Yeah, having the full set of steps required documented certainly
wouldn't be a bad thing. But it's a very *different* thing.


> > If the added newline is a problem, I haven't thought of a way to exclude
> > it that doesn't take the example out of the realm of dead-simple.
>
> I disagree that there's really a way to provide 'dead-simple' backups
> with what's built into core without using pg_basebackup.  If we want a
> 'dead-simple' solution in core then we'd need to write an appropriate
> backup tool that does all the basic things and include and maintain
> that.  Writing a shell script isn't enough and we shouldn't encourage
> our users to do exactly that by having it in our documentation because
> then they'll think it's enough.

+1.


We need to accept that the current APIs are far too low level to be
driven by a shellscript. No matter how much documentation we write is
not going to change that fact.

For the people who want to drive their backups from a shellscript and
for some reason *don't* want to use pg_basebackup, we need to come up
with a different API or a different set of tools. That is not a
documentation task. That is a "start from a list of which things
pg_basebackup cannot do that are still simple, or that tools like
pgbackrest cannot do if they're complicated".  And then design an API
that's actually safe and easy to use *for that usecase*.

For example, if the use case is "i want to use filesystemor SAN
snapshots for my backups", we shouldn't try to write workarounds using
bash coprocs or whatever. Instead, we could write a tool that
interacts with the current api to start the backup, then launches a
shellscript that interacts with the snapshot system, and then stops
the backup after. With a well defined set of rules for how that shell
script should work and interact.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



On 03/09/22 11:22, Magnus Hagander wrote:
>> It's more than just too confusing, it's actively bad because people will
>> actually use it and then end up with backups that don't work.
> 
> +1.
> 
> Or even worse, backups that sometimes work, but not reliably and not
> every time.
> ...
> Pretending something is simple when it's not, is not doing anybody a favor.

Okay, I bow to this reasoning, for the purpose of this patch. Let's
just lose the example.

>> Documenting everything that pg_basebackup does to make sure that the
>> backup is viable might be something to work on if someone is really
>> excited about this, but it's not 'dead-simple' and it's darn close to
>> the bare minimum, something that none of these simple scripts will come
>> anywhere close to being and instead they'll be far less than the
>> minimum.
> 
> Yeah, having the full set of steps required documented certainly
> wouldn't be a bad thing.

I'd say that qualifies as an understatement. While it certainly doesn't
have to be part of this patch, if the claim is that an admin who relies
on pg_basebackup is relying on essential things pg_basebackup does that
have not been enumerated in our documentation yet, I would argue they
should be.

> with a different API or a different set of tools. That is not a
> documentation task. That is a "start from a list of which things
> pg_basebackup cannot do that are still simple, or that tools like
> pgbackrest cannot do if they're complicated".  And then design an API
> that's actually safe and easy to use *for that usecase*.

That might also be a good thing, but I don't see it as a substitute
for documenting the present reality of what the irreducibly essential
behaviors of pg_basebackup (or of third-party tools like pgbackrest)
are, and why they are so.

Regards,
-Chap



Greetings,

* Chapman Flack (chap@anastigmatix.net) wrote:
> On 03/09/22 11:22, Magnus Hagander wrote:
> >> It's more than just too confusing, it's actively bad because people will
> >> actually use it and then end up with backups that don't work.
> >
> > +1.
> >
> > Or even worse, backups that sometimes work, but not reliably and not
> > every time.
> > ...
> > Pretending something is simple when it's not, is not doing anybody a favor.
>
> Okay, I bow to this reasoning, for the purpose of this patch. Let's
> just lose the example.

Great.

> >> Documenting everything that pg_basebackup does to make sure that the
> >> backup is viable might be something to work on if someone is really
> >> excited about this, but it's not 'dead-simple' and it's darn close to
> >> the bare minimum, something that none of these simple scripts will come
> >> anywhere close to being and instead they'll be far less than the
> >> minimum.
> >
> > Yeah, having the full set of steps required documented certainly
> > wouldn't be a bad thing.
>
> I'd say that qualifies as an understatement. While it certainly doesn't
> have to be part of this patch, if the claim is that an admin who relies
> on pg_basebackup is relying on essential things pg_basebackup does that
> have not been enumerated in our documentation yet, I would argue they
> should be.

It doesn't have to be part of this patch and we should move forward with
this patch.  Let's avoid hijacking this thread, which is about this
patch, for an independent debate about what our documentation should or
shouldn't include.

> > with a different API or a different set of tools. That is not a
> > documentation task. That is a "start from a list of which things
> > pg_basebackup cannot do that are still simple, or that tools like
> > pgbackrest cannot do if they're complicated".  And then design an API
> > that's actually safe and easy to use *for that usecase*.
>
> That might also be a good thing, but I don't see it as a substitute
> for documenting the present reality of what the irreducibly essential
> behaviors of pg_basebackup (or of third-party tools like pgbackrest)
> are, and why they are so.

I disagree.  If we provided a tool then we'd document that tool and how
users can use it, not every single step that it does (see also:
pg_basebackup).

Thanks,

Stephen

Attachment
On 03/09/22 12:19, Stephen Frost wrote:
> Let's avoid hijacking this thread, which is about this
> patch, for an independent debate about what our documentation should or
> shouldn't include.

Agreed. New thread here:

https://www.postgresql.org/message-id/6228FFE4.3050309%40anastigmatix.net

Regards,
-Chap



On Wed, Mar 09, 2022 at 02:32:24PM -0500, Chapman Flack wrote:
> On 03/09/22 12:19, Stephen Frost wrote:
>> Let's avoid hijacking this thread, which is about this
>> patch, for an independent debate about what our documentation should or
>> shouldn't include.
> 
> Agreed. New thread here:
> 
> https://www.postgresql.org/message-id/6228FFE4.3050309%40anastigmatix.net

Great.  Is there any additional feedback on this patch?  Should we add an
example of using pg_basebackup in the "Standalone Hot Backups" section, or
should we leave all documentation additions like this for Chap's new
thread?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On 03/09/22 17:21, Nathan Bossart wrote:
> Great.  Is there any additional feedback on this patch?  Should we add an
> example of using pg_basebackup in the "Standalone Hot Backups" section, or
> should we leave all documentation additions like this for Chap's new
> thread?

I'm composing something longer for the new thread, but on the way
I noticed something we might fit into this one.

I think the listitem

  In the same connection as before, issue the command:
  SELECT * FROM pg_backup_stop(true);

would be clearer if it used named-parameter form, wait_for_archive => true.

This is not strictly necessary, of course, for a function with a single
IN parameter, but it's good documentation (and also could save us headaches
like these if there is ever another future need to give it more parameters).

That listitem doesn't say anything about what the parameter means, which
is a little weird, but probably ok because the next listitem does go into
it in some detail. I don't think a larger reorg is needed to bring that
language one listitem earlier. Just naming the parameter is probably
enough to make it less puzzling (or adding in that listitem, at most,
"the effect of the wait_for_archive parameter is explained below").

For consistency (and the same futureproofing benefit), I'd go to
fast => false in the earlier pg_backup_start as well.

I'm more ambivalent about label => 'label'. It would be consistent,
but should we just agree for conciseness that there will always be
a label and it will always be first?

You can pretty much tell in a call what's a label; it's those anonymous
trues and falses that are easier to read with named notation.

Regards,
-Chap



On Wed, Mar 09, 2022 at 06:11:19PM -0500, Chapman Flack wrote:
> I think the listitem
> 
>   In the same connection as before, issue the command:
>   SELECT * FROM pg_backup_stop(true);
> 
> would be clearer if it used named-parameter form, wait_for_archive => true.
> 
> This is not strictly necessary, of course, for a function with a single
> IN parameter, but it's good documentation (and also could save us headaches
> like these if there is ever another future need to give it more parameters).
> 
> That listitem doesn't say anything about what the parameter means, which
> is a little weird, but probably ok because the next listitem does go into
> it in some detail. I don't think a larger reorg is needed to bring that
> language one listitem earlier. Just naming the parameter is probably
> enough to make it less puzzling (or adding in that listitem, at most,
> "the effect of the wait_for_archive parameter is explained below").
> 
> For consistency (and the same futureproofing benefit), I'd go to
> fast => false in the earlier pg_backup_start as well.
> 
> I'm more ambivalent about label => 'label'. It would be consistent,
> but should we just agree for conciseness that there will always be
> a label and it will always be first?
> 
> You can pretty much tell in a call what's a label; it's those anonymous
> trues and falses that are easier to read with named notation.

Done.  I went ahead and added "label => 'label'" for consistency.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On 3/9/22 18:06, Nathan Bossart wrote:
> On Wed, Mar 09, 2022 at 06:11:19PM -0500, Chapman Flack wrote:
>> I think the listitem
>>
>>    In the same connection as before, issue the command:
>>    SELECT * FROM pg_backup_stop(true);
>>
>> would be clearer if it used named-parameter form, wait_for_archive => true.
>>
>> This is not strictly necessary, of course, for a function with a single
>> IN parameter, but it's good documentation (and also could save us headaches
>> like these if there is ever another future need to give it more parameters).
>>
>> That listitem doesn't say anything about what the parameter means, which
>> is a little weird, but probably ok because the next listitem does go into
>> it in some detail. I don't think a larger reorg is needed to bring that
>> language one listitem earlier. Just naming the parameter is probably
>> enough to make it less puzzling (or adding in that listitem, at most,
>> "the effect of the wait_for_archive parameter is explained below").
>>
>> For consistency (and the same futureproofing benefit), I'd go to
>> fast => false in the earlier pg_backup_start as well.
>>
>> I'm more ambivalent about label => 'label'. It would be consistent,
>> but should we just agree for conciseness that there will always be
>> a label and it will always be first?
>>
>> You can pretty much tell in a call what's a label; it's those anonymous
>> trues and falses that are easier to read with named notation.
> 
> Done.  I went ahead and added "label => 'label'" for consistency.

+1 from me.

Regards,
-David




On 03/09/22 19:06, Nathan Bossart wrote:
> Done.  I went ahead and added "label => 'label'" for consistency.

Looks like this change to an example in func.sgml is not quite right:

-postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
+postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop());

pg_backup_stop returns a record now, not just lsn. So this works for me:

+postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);


Otherwise, all looks to be in good order.

Regards,
-Chap



On Thu, Mar 10, 2022 at 07:13:14PM -0500, Chapman Flack wrote:
> Looks like this change to an example in func.sgml is not quite right:
> 
> -postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
> +postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop());
> 
> pg_backup_stop returns a record now, not just lsn. So this works for me:
> 
> +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);

Ah, good catch.  I made this change in v7.  I considered doing something
like this

    SELECT w.* FROM pg_backup_stop() b, pg_walfile_name_offset(b.lsn) w;

but I think your suggestion is simpler.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On 03/10/22 19:38, Nathan Bossart wrote:
> On Thu, Mar 10, 2022 at 07:13:14PM -0500, Chapman Flack wrote:
>> +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
> 
> Ah, good catch.  I made this change in v7.  I considered doing something

v7 looks good to me. I have repeated the installcheck-world and given
the changed documentation sections one last read-through, and will
change the status to RfC.

Regards,
-Chap



On Fri, Mar 11, 2022 at 02:00:37PM -0500, Chapman Flack wrote:
> v7 looks good to me. I have repeated the installcheck-world and given
> the changed documentation sections one last read-through, and will
> change the status to RfC.

Thanks for reviewing!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Greetings,

* Nathan Bossart (nathandbossart@gmail.com) wrote:
> On Thu, Mar 10, 2022 at 07:13:14PM -0500, Chapman Flack wrote:
> > Looks like this change to an example in func.sgml is not quite right:
> >
> > -postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
> > +postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop());
> >
> > pg_backup_stop returns a record now, not just lsn. So this works for me:
> >
> > +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
>
> Ah, good catch.  I made this change in v7.  I considered doing something
> like this
>
>     SELECT w.* FROM pg_backup_stop() b, pg_walfile_name_offset(b.lsn) w;
>
> but I think your suggestion is simpler.

I tend to agree.  More below.

> Subject: [PATCH v7 1/1] remove exclusive backup mode
> diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
> index 0d69851bb1..c8b914c1aa 100644
> --- a/doc/src/sgml/backup.sgml
> +++ b/doc/src/sgml/backup.sgml
> @@ -881,19 +873,19 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0
>     <listitem>
>      <para>
>       Connect to the server (it does not matter which database) as a user with
> -     rights to run pg_start_backup (superuser, or a user who has been granted
> +     rights to run pg_backup_start (superuser, or a user who has been granted
>       EXECUTE on the function) and issue the command:
>  <programlisting>
> -SELECT pg_start_backup('label', false, false);
> +SELECT pg_backup_start(label => 'label', fast => false);
>  </programlisting>
>       where <literal>label</literal> is any string you want to use to uniquely
>       identify this backup operation. The connection
> -     calling <function>pg_start_backup</function> must be maintained until the end of
> +     calling <function>pg_backup_start</function> must be maintained until the end of
>       the backup, or the backup will be automatically aborted.
>      </para>
>
>      <para>
> -     By default, <function>pg_start_backup</function> can take a long time to finish.
> +     By default, <function>pg_backup_start</function> can take a long time to finish.
>       This is because it performs a checkpoint, and the I/O
>       required for the checkpoint will be spread out over a significant
>       period of time, by default half your inter-checkpoint interval

Hrmpf.  Not this patch's fault, but the default isn't 'half your
inter-checkpoint interval' anymore (checkpoint_completion_target was
changed to 0.9 ... let's not discuss who did that and missed fixing
this).  Overall though, maybe we should reword this to avoid having to
remember to change it again if we ever change the completion target
again?  Also it seems to imply that pg_backup_start is going to run its
own independent checkpoint or something, which isn't the typical case.
I generally explain what is going on here like this:

By default, <function>pg_backup_start</function> will wait for the next
checkpoint to complete (see ref: checkpoint_timeout ... maybe also
wal-configuration.html).

> @@ -937,7 +925,7 @@ SELECT * FROM pg_stop_backup(false, true);
>       ready to archive.
>      </para>
>      <para>
> -     The <function>pg_stop_backup</function> will return one row with three
> +     The <function>pg_backup_stop</function> will return one row with three
>       values. The second of these fields should be written to a file named
>       <filename>backup_label</filename> in the root directory of the backup. The
>       third field should be written to a file named

I get that we have <function> and </function>, but those are just
formatting and don't show up in the actual text, and so it ends up
being:

The pg_backup_stop will return

Which doesn't sound quite right to me.  I'd say we either drop 'The' or
add 'function' after.  I realize that this patch is mostly doing a
s/pg_stop_backup/pg_backup_stop/, but, still.

> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index 8a802fb225..73096708cc 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -25732,24 +25715,19 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
>          <parameter>spcmapfile</parameter> <type>text</type> )
>         </para>
>         <para>
> -        Finishes performing an exclusive or non-exclusive on-line backup.
> -        The <parameter>exclusive</parameter> parameter must match the
> -        previous <function>pg_start_backup</function> call.
> -        In an exclusive backup, <function>pg_stop_backup</function> removes
> -        the backup label file and, if it exists, the tablespace map file
> -        created by <function>pg_start_backup</function>.  In a non-exclusive
> -        backup, the desired contents of these files are returned as part of
> +        Finishes performing an on-line backup.  The desired contents of the
> +        backup label file and the tablespace map file are returned as part of
>          the result of the function, and should be written to files in the
>          backup area (not in the data directory).
>         </para>

Given that this is a crucial part, which the exclusive mode has wrong,
I'd be a bit more forceful about it, eg:

The contents of the backup label file and the tablespace map file must
be stored as part of the backup and must NOT be stored in the live data
directory (doing so will cause PostgreSQL to fail to restart in the
event of a crash).

> @@ -25771,8 +25749,7 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
>          The result of the function is a single record.
>          The <parameter>lsn</parameter> column holds the backup's ending
>          write-ahead log location (which again can be ignored).  The second and
> -        third columns are <literal>NULL</literal> when ending an exclusive
> -        backup; after a non-exclusive backup they hold the desired contents of
> +        third columns hold the desired contents of
>          the label and tablespace map files.
>         </para>
>         <para>

I dislike saying 'desired' here as if it's something that we're nicely
asking but maybe isn't a big deal, and just saying 'label' isn't good
since we call it 'backup label' elsewhere and we should try hard to be
consistent and clear.  How about:

The second column returns the contents of the backup label file, the
third column returns the contents of the tablespace map file.  These
must be stored as part of the backup and are required as part of the
restore process.

> diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
> index 3c182c97d4..ee3fa148b6 100644
> --- a/src/bin/pg_ctl/pg_ctl.c
> +++ b/src/bin/pg_ctl/pg_ctl.c
> @@ -1069,7 +1069,7 @@ do_stop(void)
>              get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
>          {
>              print_msg(_("WARNING: online backup mode is active\n"
> -                        "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
> +                        "Shutdown will not complete until pg_backup_stop() is called.\n\n"));
>          }
>
>          print_msg(_("waiting for server to shut down..."));
> @@ -1145,7 +1145,7 @@ do_restart(void)
>              get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
>          {
>              print_msg(_("WARNING: online backup mode is active\n"
> -                        "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
> +                        "Shutdown will not complete until pg_backup_stop() is called.\n\n"));
>          }
>
>          print_msg(_("waiting for server to shut down..."));

This... can't actually happen anymore, right?  Shouldn't we just pull
all of this out?

On a once-over of the rest of the code, I definitely like how much we're
able to simplify things in this area and remove various hacks in things
like pg_basebackup and pg_rewind where we previously had to worry about
backup_label and tablespace_map files being in a live data directory.
I'm planning to spend more time on this and hopefully be able to get it
in for v15.

Thanks!

Stephen

Attachment
On Mon, Mar 28, 2022 at 04:30:27PM -0400, Stephen Frost wrote:
>> -     By default, <function>pg_start_backup</function> can take a long time to finish.
>> +     By default, <function>pg_backup_start</function> can take a long time to finish.
>>       This is because it performs a checkpoint, and the I/O
>>       required for the checkpoint will be spread out over a significant
>>       period of time, by default half your inter-checkpoint interval
> 
> Hrmpf.  Not this patch's fault, but the default isn't 'half your
> inter-checkpoint interval' anymore (checkpoint_completion_target was
> changed to 0.9 ... let's not discuss who did that and missed fixing
> this).  Overall though, maybe we should reword this to avoid having to
> remember to change it again if we ever change the completion target
> again?  Also it seems to imply that pg_backup_start is going to run its
> own independent checkpoint or something, which isn't the typical case.
> I generally explain what is going on here like this:
> 
> By default, <function>pg_backup_start</function> will wait for the next
> checkpoint to complete (see ref: checkpoint_timeout ... maybe also
> wal-configuration.html).

done

>> -     The <function>pg_stop_backup</function> will return one row with three
>> +     The <function>pg_backup_stop</function> will return one row with three
>>       values. The second of these fields should be written to a file named
>>       <filename>backup_label</filename> in the root directory of the backup. The
>>       third field should be written to a file named
> 
> I get that we have <function> and </function>, but those are just
> formatting and don't show up in the actual text, and so it ends up
> being:
> 
> The pg_backup_stop will return
> 
> Which doesn't sound quite right to me.  I'd say we either drop 'The' or
> add 'function' after.  I realize that this patch is mostly doing a
> s/pg_stop_backup/pg_backup_stop/, but, still.

done

>> -        Finishes performing an exclusive or non-exclusive on-line backup.
>> -        The <parameter>exclusive</parameter> parameter must match the
>> -        previous <function>pg_start_backup</function> call.
>> -        In an exclusive backup, <function>pg_stop_backup</function> removes
>> -        the backup label file and, if it exists, the tablespace map file
>> -        created by <function>pg_start_backup</function>.  In a non-exclusive
>> -        backup, the desired contents of these files are returned as part of
>> +        Finishes performing an on-line backup.  The desired contents of the
>> +        backup label file and the tablespace map file are returned as part of
>>          the result of the function, and should be written to files in the
>>          backup area (not in the data directory).
>>         </para>
> 
> Given that this is a crucial part, which the exclusive mode has wrong,
> I'd be a bit more forceful about it, eg:
> 
> The contents of the backup label file and the tablespace map file must
> be stored as part of the backup and must NOT be stored in the live data
> directory (doing so will cause PostgreSQL to fail to restart in the
> event of a crash).

done

>>          The result of the function is a single record.
>>          The <parameter>lsn</parameter> column holds the backup's ending
>>          write-ahead log location (which again can be ignored).  The second and
>> -        third columns are <literal>NULL</literal> when ending an exclusive
>> -        backup; after a non-exclusive backup they hold the desired contents of
>> +        third columns hold the desired contents of
>>          the label and tablespace map files.
>>         </para>
>>         <para>
> 
> I dislike saying 'desired' here as if it's something that we're nicely
> asking but maybe isn't a big deal, and just saying 'label' isn't good
> since we call it 'backup label' elsewhere and we should try hard to be
> consistent and clear.  How about:
> 
> The second column returns the contents of the backup label file, the
> third column returns the contents of the tablespace map file.  These
> must be stored as part of the backup and are required as part of the
> restore process.

done

>>          {
>>              print_msg(_("WARNING: online backup mode is active\n"
>> -                        "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
>> +                        "Shutdown will not complete until pg_backup_stop() is called.\n\n"));
>>          }
>>  
>>          print_msg(_("waiting for server to shut down..."));
>> @@ -1145,7 +1145,7 @@ do_restart(void)
>>              get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
>>          {
>>              print_msg(_("WARNING: online backup mode is active\n"
>> -                        "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
>> +                        "Shutdown will not complete until pg_backup_stop() is called.\n\n"));
>>          }
>>  
>>          print_msg(_("waiting for server to shut down..."));
> 
> This... can't actually happen anymore, right?  Shouldn't we just pull
> all of this out?

done

> On a once-over of the rest of the code, I definitely like how much we're
> able to simplify things in this area and remove various hacks in things
> like pg_basebackup and pg_rewind where we previously had to worry about
> backup_label and tablespace_map files being in a live data directory.
> I'm planning to spend more time on this and hopefully be able to get it
> in for v15.

Great!  Much appreciated.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On 3/28/22 10:09 PM, Nathan Bossart wrote:
> On Mon, Mar 28, 2022 at 04:30:27PM -0400, Stephen Frost wrote:
> 
>> On a once-over of the rest of the code, I definitely like how much we're
>> able to simplify things in this area and remove various hacks in things
>> like pg_basebackup and pg_rewind where we previously had to worry about
>> backup_label and tablespace_map files being in a live data directory.
>> I'm planning to spend more time on this and hopefully be able to get it
>> in for v15.
> 
> Great!  Much appreciated.

Minor typo in the docs:

+     * capable of doing an online backup, but exclude then just in case.

Should be:

capable of doing an online backup, but exclude them just in case.

Regards,
-- 
-David
david@pgmasters.net



On Mon, Apr 04, 2022 at 09:56:26AM -0400, David Steele wrote:
> Minor typo in the docs:
> 
> +     * capable of doing an online backup, but exclude then just in case.
> 
> Should be:
> 
> capable of doing an online backup, but exclude them just in case.

fixed

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
I noticed a couple of other things that can be removed.  Since we no longer
wait on exclusive backup mode during smart shutdown, we can change
connsAllowed (in postmaster.c) to a boolean and remove CAC_SUPERUSER.  We
can also remove a couple of related notes in the documentation.  I've done
all this in the attached patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On 4/4/22 11:42 AM, Nathan Bossart wrote:
> I noticed a couple of other things that can be removed.  Since we no longer
> wait on exclusive backup mode during smart shutdown, we can change
> connsAllowed (in postmaster.c) to a boolean and remove CAC_SUPERUSER.  We
> can also remove a couple of related notes in the documentation.  I've done
> all this in the attached patch.

These changes look good to me. IMV it is a real bonus how much the state 
machine has been simplified.

I've also run this patch through the pgbackrest regression tests without 
any problems.

Regards,
-- 
-David
david@pgmasters.net



Greetings,

* David Steele (david@pgmasters.net) wrote:
> On 4/4/22 11:42 AM, Nathan Bossart wrote:
> >I noticed a couple of other things that can be removed.  Since we no longer
> >wait on exclusive backup mode during smart shutdown, we can change
> >connsAllowed (in postmaster.c) to a boolean and remove CAC_SUPERUSER.  We
> >can also remove a couple of related notes in the documentation.  I've done
> >all this in the attached patch.
>
> These changes look good to me. IMV it is a real bonus how much the state
> machine has been simplified.

Yeah, agreed.

> I've also run this patch through the pgbackrest regression tests without any
> problems.

Fantastic.

Please find attached an updated patch + commit message.  Mostly, I just
went through and did a bit more in terms of updating the documentation
and improving the comments (there were some places that were still
worrying about the chance of a 'stray' backup_label file existing, which
isn't possible any longer), along with some additional testing and
review.  This is looking pretty good to me, but other thoughts are
certainly welcome.  Otherwise, I'm hoping to commit this tomorrow.

Thanks!

Stephen

Attachment
On 4/5/22 11:25 AM, Stephen Frost wrote:
> 
> Please find attached an updated patch + commit message.  Mostly, I just
> went through and did a bit more in terms of updating the documentation
> and improving the comments (there were some places that were still
> worrying about the chance of a 'stray' backup_label file existing, which
> isn't possible any longer), along with some additional testing and
> review.  This is looking pretty good to me, but other thoughts are
> certainly welcome.  Otherwise, I'm hoping to commit this tomorrow.

I have reviewed the changes and they look good. I also ran the new patch 
through pgbackrest regression with no issues.

Regards,
-- 
-David
david@pgmasters.net





On Tue, Apr 5, 2022 at 5:25 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* David Steele (david@pgmasters.net) wrote:
> On 4/4/22 11:42 AM, Nathan Bossart wrote:
> >I noticed a couple of other things that can be removed.  Since we no longer
> >wait on exclusive backup mode during smart shutdown, we can change
> >connsAllowed (in postmaster.c) to a boolean and remove CAC_SUPERUSER.  We
> >can also remove a couple of related notes in the documentation.  I've done
> >all this in the attached patch.
>
> These changes look good to me. IMV it is a real bonus how much the state
> machine has been simplified.

Yeah, agreed.

Definitely.

 
> I've also run this patch through the pgbackrest regression tests without any
> problems.

Fantastic.

Please find attached an updated patch + commit message.  Mostly, I just
went through and did a bit more in terms of updating the documentation
and improving the comments (there were some places that were still
worrying about the chance of a 'stray' backup_label file existing, which
isn't possible any longer), along with some additional testing and
review.  This is looking pretty good to me, but other thoughts are
certainly welcome.  Otherwise, I'm hoping to commit this tomorrow.

+1. LGTM.

I'm not sure I love the renaming of the functions, but I have also yet to come up with a better idea for how to avoid silent breakage, so go with it.

--
On Tue, Apr 05, 2022 at 11:25:36AM -0400, Stephen Frost wrote:
> Please find attached an updated patch + commit message.  Mostly, I just
> went through and did a bit more in terms of updating the documentation
> and improving the comments (there were some places that were still
> worrying about the chance of a 'stray' backup_label file existing, which
> isn't possible any longer), along with some additional testing and
> review.  This is looking pretty good to me, but other thoughts are
> certainly welcome.  Otherwise, I'm hoping to commit this tomorrow.

LGTM!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Tue, 2022-04-05 at 13:06 -0700, Nathan Bossart wrote:
> On Tue, Apr 05, 2022 at 11:25:36AM -0400, Stephen Frost wrote:
> > Please find attached an updated patch + commit message.  Mostly, I just
> > went through and did a bit more in terms of updating the documentation
> > and improving the comments (there were some places that were still
> > worrying about the chance of a 'stray' backup_label file existing, which
> > isn't possible any longer), along with some additional testing and
> > review.  This is looking pretty good to me, but other thoughts are
> > certainly welcome.  Otherwise, I'm hoping to commit this tomorrow.
> 
> LGTM!

Cassandra (not the software) from the sidelines predicts that we will
get some fire from users for this, although I concede the theoretical
sanity of the change.

Yours,
Laurenz Albe




Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> On Tue, 2022-04-05 at 13:06 -0700, Nathan Bossart wrote:
> > On Tue, Apr 05, 2022 at 11:25:36AM -0400, Stephen Frost wrote:
> > > Please find attached an updated patch + commit message.  Mostly, I just
> > > went through and did a bit more in terms of updating the documentation
> > > and improving the comments (there were some places that were still
> > > worrying about the chance of a 'stray' backup_label file existing, which
> > > isn't possible any longer), along with some additional testing and
> > > review.  This is looking pretty good to me, but other thoughts are
> > > certainly welcome.  Otherwise, I'm hoping to commit this tomorrow.
> >
> > LGTM!
>
> Cassandra (not the software) from the sidelines predicts that we will
> get some fire from users for this, although I concede the theoretical
> sanity of the change.

Great, thanks for that.

This has now been committed- thanks again to everyone for their help!

Stephen

Attachment
On Wed, Apr 06, 2022 at 03:29:15PM -0400, Stephen Frost wrote:
> This has now been committed- thanks again to everyone for their help!

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Hi,

I was looking at the commit for this patch and noticed this small typo
in the comment in `basebackup.c`:

```
diff --git a/src/backend/replication/basebackup.c
b/src/backend/replication/basebackup.c
index 6884cad2c00af1632eacec07903098e7e1874393..815681ada7dd0c135af584ad5da2dd13c9a12465
100644 (file)
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -184,10 +184,8 @@ static const struct exclude_list_item excludeFiles[] =
    {RELCACHE_INIT_FILENAME, true},

    /*
-    * If there's a backup_label or tablespace_map file, it belongs to a
-    * backup started by the user with pg_start_backup().  It is *not* correct
-    * for this backup.  Our backup_label/tablespace_map is injected into the
-    * tar separately.
+    * backup_label and tablespace_map should not exist in in a running cluster
+    * capable of doing an online backup, but exclude them just in case.
     */
    {BACKUP_LABEL_FILE, false},
    {TABLESPACE_MAP, false},
```

The typo is in `exist in in a running cluster`. There's two `in` in a row.

P.D.: I was looking at this just because I was looking at an issue
where someone bumped their head with this "problem", so great that
we're in a better place now. Hopefully one day everyone will be
running PG15 or better :)

Kind regards, Martín

El mié, 6 abr 2022 a las 16:29, Stephen Frost (<sfrost@snowman.net>) escribió:
>
> Greetings,
>
> * Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> > On Tue, 2022-04-05 at 13:06 -0700, Nathan Bossart wrote:
> > > On Tue, Apr 05, 2022 at 11:25:36AM -0400, Stephen Frost wrote:
> > > > Please find attached an updated patch + commit message.  Mostly, I just
> > > > went through and did a bit more in terms of updating the documentation
> > > > and improving the comments (there were some places that were still
> > > > worrying about the chance of a 'stray' backup_label file existing, which
> > > > isn't possible any longer), along with some additional testing and
> > > > review.  This is looking pretty good to me, but other thoughts are
> > > > certainly welcome.  Otherwise, I'm hoping to commit this tomorrow.
> > >
> > > LGTM!
> >
> > Cassandra (not the software) from the sidelines predicts that we will
> > get some fire from users for this, although I concede the theoretical
> > sanity of the change.
>
> Great, thanks for that.
>
> This has now been committed- thanks again to everyone for their help!
>
> Stephen



--
Martín Marqués
It’s not that I have something to hide,
it’s that I have nothing I want you to see



On Tue, Apr 19, 2022 at 10:12:32PM -0300, Martín Marqués wrote:
> The typo is in `exist in in a running cluster`. There's two `in` in a row.

Thanks, fixed.
--
Michael

Attachment
Greetings,

* Martín Marqués (martin.marques@gmail.com) wrote:
> The typo is in `exist in in a running cluster`. There's two `in` in a row.

Oops, thanks for catching (and thanks to Michael for committing the
fix!).

> P.D.: I was looking at this just because I was looking at an issue
> where someone bumped their head with this "problem", so great that
> we're in a better place now. Hopefully one day everyone will be
> running PG15 or better :)

Agreed!  Does make me wonder just how often folks run into this issue..
Glad that we were able to get the change in for v15.

Thanks again!

Stephen

Attachment