Thread: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
SATYANARAYANA NARLAPURAM
Date:
Hi Hackers,
While an exclusive backup is in progress if Postgres restarts, postgres runs the recovery from the checkpoint identified by the label file instead of the control file. This can cause long recovery or even sometimes fail to recover as the WAL records corresponding to that checkpoint location are removed. I can write a layer in my control plane to remove the backup_label file when I know the server is not in restore from the base backup but I don't see a reason why everyone has to repeat this step. Am I missing something?
If there are no standby.signal or recovery.signal, what is the use case of honoring backup_label file? Even when they exist, for a long running recovery, should we honor the backup_label file as the majority of the WAL already applied? It does slow down the recovery on restart right as it has to start all the way from the beginning?
Thanks,
Satya
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Michael Paquier
Date:
On Wed, Nov 24, 2021 at 02:12:19PM -0800, SATYANARAYANA NARLAPURAM wrote: > While an exclusive backup is in progress if Postgres restarts, postgres > runs the recovery from the checkpoint identified by the label file instead > of the control file. This can cause long recovery or even sometimes fail to > recover as the WAL records corresponding to that checkpoint location are > removed. I can write a layer in my control plane to remove the backup_label > file when I know the server is not in restore from the base backup but I > don't see a reason why everyone has to repeat this step. Am I missing > something? This is a known issue with exclusive backups, which is a reason why non-exclusive backups have been implemented. pg_basebackup does that, and using "false" as the third argument of pg_start_backup() would have the same effect. So I would recommend to switch to that. -- Michael
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
SATYANARAYANA NARLAPURAM
Date:
Thanks Michael!
This is a known issue with exclusive backups, which is a reason why
non-exclusive backups have been implemented. pg_basebackup does that,
and using "false" as the third argument of pg_start_backup() would
have the same effect. So I would recommend to switch to that.
Is there a plan in place to remove the exclusive backup option from the core in PG 15/16? If we are keeping it then why not make it better?
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Michael Paquier
Date:
On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote: > Is there a plan in place to remove the exclusive backup option from the > core in PG 15/16? This was discussed, but removing it could also harm users relying on it. Perhaps it could be revisited, but I am not sure if this is worth worrying about either. > If we are keeping it then why not make it better? Well, non-exclusive backups are better by design in many aspects, so I don't quite see the point in spending time on something that has more limitations than what's already in place. -- Michael
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote: >> If we are keeping it then why not make it better? > Well, non-exclusive backups are better by design in many aspects, so I > don't quite see the point in spending time on something that has more > limitations than what's already in place. IMO the main reason for keeping it is backwards compatibility for users who have a satisfactory backup arrangement using it. That same argument implies that we shouldn't change how it works (at least, not very much). regards, tom lane
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 11/26/21, 7:33 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael@paquier.xyz> writes: >> On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote: >>> If we are keeping it then why not make it better? > >> Well, non-exclusive backups are better by design in many aspects, so I >> don't quite see the point in spending time on something that has more >> limitations than what's already in place. > > IMO the main reason for keeping it is backwards compatibility for users > who have a satisfactory backup arrangement using it. That same argument > implies that we shouldn't change how it works (at least, not very much). The issues with exclusive backups seem to be fairly well-documented (e.g., c900c15), but perhaps there should also be a note in the "Backup Control Functions" table [0]. Nathan [0] https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-ADMIN-BACKUP
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Michael Paquier <michael@paquier.xyz> writes: > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote: > >> If we are keeping it then why not make it better? > > > Well, non-exclusive backups are better by design in many aspects, so I > > don't quite see the point in spending time on something that has more > > limitations than what's already in place. > > IMO the main reason for keeping it is backwards compatibility for users > who have a satisfactory backup arrangement using it. That same argument > implies that we shouldn't change how it works (at least, not very much). There isn't a satisfactory backup approach using it specifically because of this issue, hence why we should remove it to make it so users don't run into this. Would also simplify the documentation around the low level backup API, which would be a very good thing. Right now, making improvements in that area is very challenging even if all you want to do is improve the documentation around the non-exclusive API. We dealt with this as best as one could in pgbackrest for PG versions prior to when non-exclusive backup was added- which is to remove the backup_label file as soon as possible and then put it back right before you call pg_stop_backup() (since it'll complain otherwise). Not a perfect answer though and a risk still exists there of a failed restart happening. Of course, for versions which support non-exclusive backup, we use that to avoid this issue. We also extensively changed how restore works a couple releases ago and while there was some noise about it, it certainly wasn't all that bad. I don't find the reasons brought up to continue to support exclusive backup to be at all compelling and the lack of huge issues with the new way restore works to make it abundently clear that we can, in fact, remove exclusive backup in a major version change without the world coming down. Thanks, Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Laurenz Albe
Date:
On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote: > Greetings, > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > Michael Paquier <michael@paquier.xyz> writes: > > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote: > > > > If we are keeping it then why not make it better? > > > > > Well, non-exclusive backups are better by design in many aspects, so I > > > don't quite see the point in spending time on something that has more > > > limitations than what's already in place. > > > > IMO the main reason for keeping it is backwards compatibility for users > > who have a satisfactory backup arrangement using it. That same argument > > implies that we shouldn't change how it works (at least, not very much). > > There isn't a satisfactory backup approach using it specifically because > of this issue, hence why we should remove it to make it so users don't > run into this. There is a satisfactory approach, as long as you are satisfied with manually restarting the server if it crashed during a backup. > I don't find the reasons brought up to continue to support exclusive > backup to be at all compelling and the lack of huge issues with the new > way restore works to make it abundently clear that we can, in fact, > remove exclusive backup in a major version change without the world > coming down. I guess the lack of hue and cry was at least to a certain extent because the exclusive backup API was deprecated, but not removed. Yours, Laurenz Albe
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings,
On Tue, Nov 30, 2021 at 11:47 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote:
> Greetings,
>
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Michael Paquier <michael@paquier.xyz> writes:
> > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
> > > > If we are keeping it then why not make it better?
> >
> > > Well, non-exclusive backups are better by design in many aspects, so I
> > > don't quite see the point in spending time on something that has more
> > > limitations than what's already in place.
> >
> > IMO the main reason for keeping it is backwards compatibility for users
> > who have a satisfactory backup arrangement using it. That same argument
> > implies that we shouldn't change how it works (at least, not very much).
>
> There isn't a satisfactory backup approach using it specifically because
> of this issue, hence why we should remove it to make it so users don't
> run into this.
There is a satisfactory approach, as long as you are satisfied with
manually restarting the server if it crashed during a backup.
I disagree that that’s a satisfactory approach. It certainly wasn’t intended or documented as part of the original feature and therefore to call it satisfactory strikes me quite strongly as revisionist history.
> I don't find the reasons brought up to continue to support exclusive
> backup to be at all compelling and the lack of huge issues with the new
> way restore works to make it abundently clear that we can, in fact,
> remove exclusive backup in a major version change without the world
> coming down.
I guess the lack of hue and cry was at least to a certain extent because
the exclusive backup API was deprecated, but not removed.
These comments were in reference to the restore API, which was quite changed (new special files that have to be touched, removing of recovery.conf, options moved to postgresql.conf/.auto, etc). So, no.
Thanks,
Stephen
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 11/30/21, 9:51 AM, "Stephen Frost" <sfrost@snowman.net> wrote: > I disagree that that’s a satisfactory approach. It certainly wasn’t > intended or documented as part of the original feature and therefore > to call it satisfactory strikes me quite strongly as revisionist > history. It looks like the exclusive way has been marked deprecated in all supported versions along with a note that it will eventually be removed. If it's not going to be removed out of fear of breaking backward compatibility, I think the documentation should be updated to say that. However, unless there is something that is preventing users from switching to the non-exclusive approach, I think it is reasonable to begin thinking about removing it. Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Tom Lane
Date:
"Bossart, Nathan" <bossartn@amazon.com> writes: > It looks like the exclusive way has been marked deprecated in all > supported versions along with a note that it will eventually be > removed. If it's not going to be removed out of fear of breaking > backward compatibility, I think the documentation should be updated to > say that. However, unless there is something that is preventing users > from switching to the non-exclusive approach, I think it is reasonable > to begin thinking about removing it. If we're willing to outright remove it, I don't have any great objection. My original two cents was that we shouldn't put effort into improving it; but removing it isn't that. regards, tom lane
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 11/30/21, 2:27 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > If we're willing to outright remove it, I don't have any great objection. > My original two cents was that we shouldn't put effort into improving it; > but removing it isn't that. I might try to put a patch together for the January commitfest, given there is enough support. Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 11/30/21 17:26, Tom Lane wrote: > "Bossart, Nathan" <bossartn@amazon.com> writes: >> It looks like the exclusive way has been marked deprecated in all >> supported versions along with a note that it will eventually be >> removed. If it's not going to be removed out of fear of breaking >> backward compatibility, I think the documentation should be updated to >> say that. However, unless there is something that is preventing users >> from switching to the non-exclusive approach, I think it is reasonable >> to begin thinking about removing it. > > If we're willing to outright remove it, I don't have any great objection. > My original two cents was that we shouldn't put effort into improving it; > but removing it isn't that. The main objections as I recall are that it is much harder for simple backup scripts and commercial backup integrations to hold a connection to postgres open and write the backup label separately into the backup. As Stephen noted, working in this area is much harder (even in the docs) due to the need to keep both methods working. When I removed exclusive backup it didn't break any tests, other than one that needed to generate a corrupt backup, so we have virtually no coverage for that method. I did figure out how to keep the safe part of exclusive backup (not having to maintain a connection) while removing the dangerous part (writing backup_label into PGDATA), but it was a substantial amount of work and I felt that it had little chance of being committed. Attaching the thread [1] that I started with a patch to remove exclusive backup for reference. -- [1] https://www.postgresql.org/message-id/flat/ac7339ca-3718-3c93-929f-99e725d1172c%40pgmasters.net
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 11/30/21, 2:58 PM, "David Steele" <david@pgmasters.net> wrote: > I did figure out how to keep the safe part of exclusive backup (not > having to maintain a connection) while removing the dangerous part > (writing backup_label into PGDATA), but it was a substantial amount of > work and I felt that it had little chance of being committed. Do you think it's still worth trying to make it safe, or do you think we should just remove exclusive mode completely? > Attaching the thread [1] that I started with a patch to remove exclusive > backup for reference. Ah, good, some light reading. :) Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Michael Paquier
Date:
On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote: > The main objections as I recall are that it is much harder for simple backup > scripts and commercial backup integrations to hold a connection to postgres > open and write the backup label separately into the backup. I don't quite understand why this argument would not hold even today, even if I'd like to think that more people are using pg_basebackup. > I did figure out how to keep the safe part of exclusive backup (not having > to maintain a connection) while removing the dangerous part (writing > backup_label into PGDATA), but it was a substantial amount of work and I > felt that it had little chance of being committed. Which was, I guess, done by storing the backup_label contents within a file different than backup_label, still maintained in the main data folder to ensure that it gets included in the backup? -- Michael
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
SATYANARAYANA NARLAPURAM
Date:
On Tue, Nov 30, 2021 at 4:54 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
> The main objections as I recall are that it is much harder for simple backup
> scripts and commercial backup integrations to hold a connection to postgres
> open and write the backup label separately into the backup.
I don't quite understand why this argument would not hold even today,
even if I'd like to think that more people are using pg_basebackup.
> I did figure out how to keep the safe part of exclusive backup (not having
> to maintain a connection) while removing the dangerous part (writing
> backup_label into PGDATA), but it was a substantial amount of work and I
> felt that it had little chance of being committed.
Which was, I guess, done by storing the backup_label contents within a
file different than backup_label, still maintained in the main data
folder to ensure that it gets included in the backup?
Non-exclusive backup has significant advantages over exclusive backups but would like to add a few comments on the simplicity of exclusive backups -
1/ It is not uncommon nowadays to take a snapshot based backup. Exclusive backup simplifies this story as the backup label file is part of the snapshot. Otherwise, one needs to store it somewhere outside as snapshot metadata and copy this file over during restore (after creating a disk from the snapshot) to the data directory. Typical steps included are 1/ start pg_base_backup 2/ Take disk snapshot 3/ pg_stop_backup() 4/ Mark snapshot as consistent and add some create time metadata.
2/ Control plane code responsible for taking backups is simpler with exclusive backups than non-exclusive as it doesn't maintain a connection to the server, particularly when that orchestration is outside the machine the Postgres server is running on.
IMHO, we should either remove the support for it or improve it but not leave it hanging there.
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 11/30/21 19:54, Michael Paquier wrote: > On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote: >> I did figure out how to keep the safe part of exclusive backup (not having >> to maintain a connection) while removing the dangerous part (writing >> backup_label into PGDATA), but it was a substantial amount of work and I >> felt that it had little chance of being committed. > > Which was, I guess, done by storing the backup_label contents within a > file different than backup_label, still maintained in the main data > folder to ensure that it gets included in the backup? That, or emit it from pg_start_backup() so the user can write it wherever they please. That would include writing it into PGDATA if they really wanted to, but that would be on them and the default behavior would be safe. The problem with this is if the user does not rename/supply backup_label on restore then they will get corruption and not know it. Here's another idea. Since the contents of pg_wal are not supposed to be copied, we could add a file there to indicate that the cluster should remove backup_label on restart. Our instructions also say to remove the contents of pg_wal on restore if they were originally copied, so hopefully one of the two would happen. But, again, if they fail to follow the directions it would lead to corruption. Order would be important here. When starting the backup the proper order would be to write pg_wal/backup_in_progress and then backup_label. When stopping the backup they would be removed in the reverse order. On a restart if both are present then delete both in the correct order and start crash recovery using the info in pg_control. If only backup_label is present then go into recovery using the info from backup_label. It's possible for pg_wal/backup_in_process to be present by itself if the server crashes after deleting backup_label but before deleting pg_wal/backup_in_progress. In that case the server should simply remove it on start and go into crash recovery using the info from pg_control. The advantage of this idea is that it does not change the current instructions as far as I can see. If the user is already following them, they'll be fine. If they are not, then they'll need to start doing so. Of course, none of this affects users who are using non-exclusive backup, which I do hope covers the majority by now. Thoughts? Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Andrew Dunstan
Date:
On 11/30/21 17:26, Tom Lane wrote: > "Bossart, Nathan" <bossartn@amazon.com> writes: >> It looks like the exclusive way has been marked deprecated in all >> supported versions along with a note that it will eventually be >> removed. If it's not going to be removed out of fear of breaking >> backward compatibility, I think the documentation should be updated to >> say that. However, unless there is something that is preventing users >> from switching to the non-exclusive approach, I think it is reasonable >> to begin thinking about removing it. > If we're willing to outright remove it, I don't have any great objection. > My original two cents was that we shouldn't put effort into improving it; > but removing it isn't that. > > +1 Let's just remove it. We already know it's a footgun, and there's been plenty of warning. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 11/30/21 18:31, Bossart, Nathan wrote: > On 11/30/21, 2:58 PM, "David Steele" <david@pgmasters.net> wrote: >> I did figure out how to keep the safe part of exclusive backup (not >> having to maintain a connection) while removing the dangerous part >> (writing backup_label into PGDATA), but it was a substantial amount of >> work and I felt that it had little chance of being committed. > > Do you think it's still worth trying to make it safe, or do you think > we should just remove exclusive mode completely? My preference would be to remove it completely, but I haven't gotten a lot of traction so far. >> Attaching the thread [1] that I started with a patch to remove exclusive >> backup for reference. > > Ah, good, some light reading. :) Sure, if you say so! Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote: > On 11/30/21 18:31, Bossart, Nathan wrote: >> Do you think it's still worth trying to make it safe, or do you think >> we should just remove exclusive mode completely? > > My preference would be to remove it completely, but I haven't gotten a > lot of traction so far. In this thread, I count 6 people who seem alright with removing it, and 2 who might be opposed, although I don't think anyone has explicitly stated they are against it. Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 12/1/21, 10:37 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote: >> On 11/30/21 18:31, Bossart, Nathan wrote: >>> Do you think it's still worth trying to make it safe, or do you think >>> we should just remove exclusive mode completely? >> >> My preference would be to remove it completely, but I haven't gotten a >> lot of traction so far. > > In this thread, I count 6 people who seem alright with removing it, > and 2 who might be opposed, although I don't think anyone has > explicitly stated they are against it. I hastily rebased the patch from 2018 and got it building and passing the tests. I'm sure it will need additional changes, but I'll wait for more feedback before I expend too much more effort on this. Nathan
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Andrew Dunstan
Date:
On 12/1/21 19:30, Bossart, Nathan wrote: > On 12/1/21, 10:37 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote: >> On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote: >>> On 11/30/21 18:31, Bossart, Nathan wrote: >>>> Do you think it's still worth trying to make it safe, or do you think >>>> we should just remove exclusive mode completely? >>> My preference would be to remove it completely, but I haven't gotten a >>> lot of traction so far. >> In this thread, I count 6 people who seem alright with removing it, >> and 2 who might be opposed, although I don't think anyone has >> explicitly stated they are against it. > I hastily rebased the patch from 2018 and got it building and passing > the tests. I'm sure it will need additional changes, but I'll wait > for more feedback before I expend too much more effort on this. > Should we really be getting rid of PostgreSQL::Test::Cluster::backup_fs_hot() ? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 12/2/21 11:00, Andrew Dunstan wrote: > > On 12/1/21 19:30, Bossart, Nathan wrote: >> On 12/1/21, 10:37 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote: >>> On 12/1/21, 8:27 AM, "David Steele" <david@pgmasters.net> wrote: >>>> On 11/30/21 18:31, Bossart, Nathan wrote: >>>>> Do you think it's still worth trying to make it safe, or do you think >>>>> we should just remove exclusive mode completely? >>>> My preference would be to remove it completely, but I haven't gotten a >>>> lot of traction so far. >>> In this thread, I count 6 people who seem alright with removing it, >>> and 2 who might be opposed, although I don't think anyone has >>> explicitly stated they are against it. >> I hastily rebased the patch from 2018 and got it building and passing >> the tests. I'm sure it will need additional changes, but I'll wait >> for more feedback before I expend too much more effort on this. >> > > Should we really be getting rid of > PostgreSQL::Test::Cluster::backup_fs_hot() ? Agreed, it would be better to update backup_fs_hot() to use exclusive mode and save out backup_label instead. Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 12/2/21 12:38, David Steele wrote: > On 12/2/21 11:00, Andrew Dunstan wrote: >> >> Should we really be getting rid of >> PostgreSQL::Test::Cluster::backup_fs_hot() ? > > Agreed, it would be better to update backup_fs_hot() to use exclusive > mode and save out backup_label instead. Oops, of course I meant non-exclusive mode. Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 12/2/21, 9:50 AM, "David Steele" <david@pgmasters.net> wrote: > On 12/2/21 12:38, David Steele wrote: >> On 12/2/21 11:00, Andrew Dunstan wrote: >>> >>> Should we really be getting rid of >>> PostgreSQL::Test::Cluster::backup_fs_hot() ? >> >> Agreed, it would be better to update backup_fs_hot() to use exclusive >> mode and save out backup_label instead. > > Oops, of course I meant non-exclusive mode. +1. I'll fix that in the next revision. Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 12/2/21, 1:34 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > On 12/2/21, 9:50 AM, "David Steele" <david@pgmasters.net> wrote: >> On 12/2/21 12:38, David Steele wrote: >>> On 12/2/21 11:00, Andrew Dunstan wrote: >>>> >>>> Should we really be getting rid of >>>> PostgreSQL::Test::Cluster::backup_fs_hot() ? >>> >>> Agreed, it would be better to update backup_fs_hot() to use exclusive >>> mode and save out backup_label instead. >> >> Oops, of course I meant non-exclusive mode. > > +1. I'll fix that in the next revision. I finally got around to looking into this, and I think I found why it was done this way in 2018. backup_fs_hot() runs pg_start_backup(), closes the session, copies the data, and then runs pg_stop_backup() in a different session. This doesn't work with non-exclusive mode because the backup will be aborted when the session that runs pg_start_backup() is closed. pg_stop_backup() will fail with a "backup is not in progress" error. Furthermore, 010_logical_decoding_timelines.pl seems to be the only test that uses backup_fs_hot(). After a quick glance, I didn't see an easy way to hold a session open while the test does other things. If there isn't one, modifying backup_fs_hot() to work with non-exclusive mode might be more trouble than it is worth. Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Euler Taveira"
Date:
On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote:
After a quick glance, I didn't see an easy way to hold a session openwhile the test does other things. If there isn't one, modifyingbackup_fs_hot() to work with non-exclusive mode might be more troublethan it is worth.
You can use IPC::Run to start psql in background. See examples in
src/test/recovery.
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 1/6/22 20:20, Euler Taveira wrote: > On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote: >> After a quick glance, I didn't see an easy way to hold a session open >> while the test does other things. If there isn't one, modifying >> backup_fs_hot() to work with non-exclusive mode might be more trouble >> than it is worth. > > You can use IPC::Run to start psql in background. See examples in > src/test/recovery. I don't think updating backup_fs_hot() is worth it here. backup_fs_cold() works just fine for this case and if there is a need for backup_fs_hot() in the future it can be implemented as needed. Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
"Bossart, Nathan"
Date:
On 1/7/22, 5:52 AM, "David Steele" <david@pgmasters.net> wrote: > On 1/6/22 20:20, Euler Taveira wrote: >> On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote: >>> After a quick glance, I didn't see an easy way to hold a session open >>> while the test does other things. If there isn't one, modifying >>> backup_fs_hot() to work with non-exclusive mode might be more trouble >>> than it is worth. >> >> You can use IPC::Run to start psql in background. See examples in >> src/test/recovery. > > I don't think updating backup_fs_hot() is worth it here. > > backup_fs_cold() works just fine for this case and if there is a need > for backup_fs_hot() in the future it can be implemented as needed. Thanks for the pointer on IPC::Run. I had a feeling I was missing something obvious! I think I agree with David that it still isn't worth it for just this one test. Of course, it would be great to test the non-exclusive backup logic as much as possible, but I'm not sure that this particular test will provide any sort of meaningful coverage. Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
Here is a rebased patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Ashutosh Sharma
Date:
Some review comments on the latest version: + * runningBackups is a counter indicating the number of backups currently in + * progress. forcePageWrites is set to true when either of these is + * non-zero. lastBackupStart is the latest checkpoint redo location used as + * a starting point for an online backup. */ - ExclusiveBackupState exclusiveBackupState; - int nonExclusiveBackups; What do you mean by "either of these is non-zero ''. Earlier we used to set forcePageWrites in case of both exclusive and non-exclusive backups, but we have just one type of backup now. == - * OK to update backup counters, forcePageWrites and session-level lock. + * OK to update backup counters and forcePageWrites. * We still update the status of session-level lock so I don't think we should update the above comment. See below code: if (XLogCtl->Insert.runningBackups == 0) { XLogCtl->Insert.forcePageWrites = false; } /* * Clean up session-level lock. * * You might think that WALInsertLockRelease() can be called before * cleaning up session-level lock because session-level lock doesn't need * to be protected with WAL insertion lock. But since * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be * cleaned up before it. */ sessionBackupState = SESSION_BACKUP_NONE; WALInsertLockRelease(); == @@ -8993,18 +8686,16 @@ do_pg_abort_backup(int code, Datum arg) bool emit_warning = DatumGetBool(arg); /* - * Quick exit if session is not keeping around a non-exclusive backup - * already started. + * Quick exit if session does not have a running backup. */ - if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE) + if (sessionBackupState != SESSION_BACKUP_RUNNING) return; WALInsertLockAcquireExclusive(); - Assert(XLogCtl->Insert.nonExclusiveBackups > 0); - XLogCtl->Insert.nonExclusiveBackups--; + Assert(XLogCtl->Insert.runningBackups > 0); + XLogCtl->Insert.runningBackups--; - if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE && - XLogCtl->Insert.nonExclusiveBackups == 0) + if (XLogCtl->Insert.runningBackups == 0) { XLogCtl->Insert.forcePageWrites = false; } I think we have a lot of common code in do_pg_abort_backup() and pg_do_stop_backup(). So why not have a common function that can be called from both these functions. == +# Now delete the bogus backup_label file since it will interfere with startup +unlink("$pgdata/backup_label") + or BAIL_OUT("unable to unlink $pgdata/backup_label"); + Why do we need this additional change? Earlier this was not required. -- With Regards, Ashutosh Sharma. On Thu, Feb 17, 2022 at 6:41 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > Here is a rebased patch. > > -- > Nathan Bossart > Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote: > Some review comments on the latest version: Thanks for the feedback! Before I start spending more time on this one, I should probably ask if this has any chance of making it into v15. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Ashutosh Sharma
Date:
On Sat, Feb 19, 2022 at 2:24 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote: > > Some review comments on the latest version: > > Thanks for the feedback! Before I start spending more time on this one, I > should probably ask if this has any chance of making it into v15. I don't see any reason why it can't make it to v15. However, it is not super urgent as the users facing this problem have a choice. They can use non-exclusive mode. -- With Regards, Ashutosh Sharma.
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
I've attached an updated patch. On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote: > + * runningBackups is a counter indicating the number of backups currently in > + * progress. forcePageWrites is set to true when either of these is > + * non-zero. lastBackupStart is the latest checkpoint redo location used as > + * a starting point for an online backup. > */ > - ExclusiveBackupState exclusiveBackupState; > - int nonExclusiveBackups; > > What do you mean by "either of these is non-zero ''. Earlier we used > to set forcePageWrites in case of both exclusive and non-exclusive > backups, but we have just one type of backup now. Fixed this. > - * OK to update backup counters, forcePageWrites and session-level lock. > + * OK to update backup counters and forcePageWrites. > * > > We still update the status of session-level lock so I don't think we > should update the above comment. See below code: Fixed this. > I think we have a lot of common code in do_pg_abort_backup() and > pg_do_stop_backup(). So why not have a common function that can be > called from both these functions. I didn't follow through with this change. I only saw a handful of lines that looked similar, and AFAICT we'd need an extra branch for cleaning up the session-level lock since do_pg_abort_backup() doesn't. > +# Now delete the bogus backup_label file since it will interfere with startup > +unlink("$pgdata/backup_label") > + or BAIL_OUT("unable to unlink $pgdata/backup_label"); > + > > Why do we need this additional change? Earlier this was not required. IIUC this test relied on the following code to handle the bogus file: /* * Terminate exclusive backup mode to avoid recovery after a clean * fast shutdown. Since an exclusive backup can only be taken * during normal running (and not, for example, while running * under Hot Standby) it only makes sense to do this if we reached * normal running. If we're still in recovery, the backup file is * one we're recovering *from*, and we must keep it around so that * recovery restarts from the right place. */ if (ReachedNormalRunning) CancelBackup(); The attached patch removes this code. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested This patch applies cleanly for me and passes installcheck-world. I have not yet studied all of the changes in detail. Some proofreading nits in the documentation: the pg_stop_backup with arguments has lost the 'exclusive' argument, but still shows a comma before the 'wait_for_archive' argument. And the niladic pg_stop_backup is still documented, though it no longer exists. My biggest concerns are the changes to the SQL-visible pg_start_backup and pg_stop_backup functions. When the non-exclusive API was implemented (in 7117685), that was done with care (with a new optional argument to pg_start_backup, and a new overload of pg_stop_backup) to avoid immediate breakage of working backup scripts. With this patch, even scripts that were dutifully migrated to that new API and now invoke pg_start_backup(label, false) or (label, exclusive => false) will immediately and unnecessarily break. What I would suggest for this patch would be to change the exclusive default from true to false, and have the function report an ERROR if true is passed. Otherwise, for sites using a third-party backup solution, there will be an unnecessary requirement to synchronize a PostgreSQL upgrade with an upgrade of the backup solution that won't be broken by the change. For a site with their backup procedures scripted in-house, there will be an unnecessarily urgent need for the current admin team to study and patch the currently-working scripts. That can be avoided by just changing the default to false and rejecting calls where true is passed. That will break only scripts that never got the memo about moving to non-exclusive backup, available for six years now. Assuming the value is false, so no error is thrown, is it practical to determine from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would further suggest reporting a deprecation WARNING if it was explicitly supplied, with a HINT that the argument can simply be removed at the call site, and will become unrecognized in some future release. pg_stop_backup needs thought, because 7117685 added a new overload for that function, rather than just an optional argument. This patch removes the old niladic version that returned pg_lsn, leaving just one version, with an optional argument, that returns a record. Here again, the old niladic one was only suitable for exclusive backups, so there can't be any script existing in 2022 that still calls that unless it has never been updated in six years to nonexclusive backups, and that breakage can't be helped. Any scripts that did get dutifully updated over the last six years will be calling the record-returning version, passing false, or exclusive => false. This patch as it stands will unnecessarily break those, but here again I think that can be avoided just by making the exclusive parameter optional with default false, and reporting an error if true is passed. Here again, I would consider also issuing a deprecation warning if the argument is explicitly supplied, if it is practical to determine that from fn_expr. (I haven't looked yet to see how practical that is.) Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 02/26/22 11:48, Chapman Flack wrote: > This patch applies cleanly for me and passes installcheck-world. > I have not yet studied all of the changes in detail. I've now looked through the rest, and the only further thing I noticed was that xlog.c's do_pg_start_backup still has a tablespaces parameter to receive a List* of tablespaces if the caller wants, but this patch removes the comment describing it: - * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs - * describing the cluster's tablespaces. which seems like collateral damage. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Sat, Feb 26, 2022 at 04:48:52PM +0000, Chapman Flack wrote: > My biggest concerns are the changes to the SQL-visible pg_start_backup > and pg_stop_backup functions. When the non-exclusive API was implemented > (in 7117685), that was done with care (with a new optional argument to > pg_start_backup, and a new overload of pg_stop_backup) to avoid immediate > breakage of working backup scripts. > > With this patch, even scripts that were dutifully migrated to that new API and > now invoke pg_start_backup(label, false) or (label, exclusive => false) will > immediately and unnecessarily break. What I would suggest for this patch > would be to change the exclusive default from true to false, and have the > function report an ERROR if true is passed. > > Otherwise, for sites using a third-party backup solution, there will be an > unnecessary requirement to synchronize a PostgreSQL upgrade with an > upgrade of the backup solution that won't be broken by the change. For > a site with their backup procedures scripted in-house, there will be an > unnecessarily urgent need for the current admin team to study and patch > the currently-working scripts. > > That can be avoided by just changing the default to false and rejecting calls > where true is passed. That will break only scripts that never got the memo > about moving to non-exclusive backup, available for six years now. > > Assuming the value is false, so no error is thrown, is it practical to determine > from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would > further suggest reporting a deprecation WARNING if it was explicitly supplied, > with a HINT that the argument can simply be removed at the call site, and will > become unrecognized in some future release. This is a good point. I think I agree with your proposed changes. I believe it is possible to add a deprecation warning only when 'exclusive' is specified. If anything, we can create a separate function that accepts the 'exclusive' parameter and that always emits a NOTICE or WARNING. > pg_stop_backup needs thought, because 7117685 added a new overload for that > function, rather than just an optional argument. This patch removes the old > niladic version that returned pg_lsn, leaving just one version, with an optional > argument, that returns a record. > > Here again, the old niladic one was only suitable for exclusive backups, so there > can't be any script existing in 2022 that still calls that unless it has never been > updated in six years to nonexclusive backups, and that breakage can't be > helped. > > Any scripts that did get dutifully updated over the last six years will be calling the > record-returning version, passing false, or exclusive => false. This patch as it > stands will unnecessarily break those, but here again I think that can be avoided > just by making the exclusive parameter optional with default false, and reporting > an error if true is passed. > > Here again, I would consider also issuing a deprecation warning if the argument > is explicitly supplied, if it is practical to determine that from fn_expr. (I haven't > looked yet to see how practical that is.) Agreed. I will look into updating this one, too. I think the 'exclusive' parameter should remain documented for now for both pg_start_backup() and pg_stop_backup(), but this documentation will just note that it is there for backward compatibility and must be set to false. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Sat, Feb 26, 2022 at 05:03:04PM -0500, Chapman Flack wrote: > I've now looked through the rest, and the only further thing I noticed > was that xlog.c's do_pg_start_backup still has a tablespaces parameter > to receive a List* of tablespaces if the caller wants, but this patch > removes the comment describing it: > > > - * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs > - * describing the cluster's tablespaces. > > > which seems like collateral damage. Thanks. I will fix this and the proofreading nits you noted upthread in the next revision. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Sat, Feb 26, 2022 at 02:06:14PM -0800, Nathan Bossart wrote: > On Sat, Feb 26, 2022 at 04:48:52PM +0000, Chapman Flack wrote: >> Assuming the value is false, so no error is thrown, is it practical to determine >> from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would >> further suggest reporting a deprecation WARNING if it was explicitly supplied, >> with a HINT that the argument can simply be removed at the call site, and will >> become unrecognized in some future release. > > This is a good point. I think I agree with your proposed changes. I > believe it is possible to add a deprecation warning only when 'exclusive' > is specified. If anything, we can create a separate function that accepts > the 'exclusive' parameter and that always emits a NOTICE or WARNING. I've spent some time looking into this, and I haven't found a clean way to emit a WARNING only if the "exclusive" parameter is supplied (and set to false). AFAICT flinfo->fn_expr doesn't tell us whether the parameter was supplied or the default value was used. I was able to get it working by splitting pg_start_backup() into 3 separate internal functions (i.e., pg_start_backup_1arg(), pg_start_backup_2arg(), and pg_start_backup_3arg()), but this breaks calls such as pg_start_backup('mylabel', exclusive => false), and it might complicate privilege management for users. Without a WARNING, I think it will be difficult to justify removing the "exclusive" parameter in the future. We would either need to leave it around forever, or we would have to risk unnecessarily breaking some working backup scripts. I wonder if we should just remove it now and make sure that this change is well-documented in the release notes. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 2/28/22 23:51, Nathan Bossart wrote: > On Sat, Feb 26, 2022 at 02:06:14PM -0800, Nathan Bossart wrote: >> On Sat, Feb 26, 2022 at 04:48:52PM +0000, Chapman Flack wrote: >>> Assuming the value is false, so no error is thrown, is it practical to determine >>> from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would >>> further suggest reporting a deprecation WARNING if it was explicitly supplied, >>> with a HINT that the argument can simply be removed at the call site, and will >>> become unrecognized in some future release. >> >> This is a good point. I think I agree with your proposed changes. I >> believe it is possible to add a deprecation warning only when 'exclusive' >> is specified. If anything, we can create a separate function that accepts >> the 'exclusive' parameter and that always emits a NOTICE or WARNING. > > I've spent some time looking into this, and I haven't found a clean way to > emit a WARNING only if the "exclusive" parameter is supplied (and set to > false). AFAICT flinfo->fn_expr doesn't tell us whether the parameter was > supplied or the default value was used. I was able to get it working by > splitting pg_start_backup() into 3 separate internal functions (i.e., > pg_start_backup_1arg(), pg_start_backup_2arg(), and > pg_start_backup_3arg()), but this breaks calls such as > pg_start_backup('mylabel', exclusive => false), and it might complicate > privilege management for users. > > Without a WARNING, I think it will be difficult to justify removing the > "exclusive" parameter in the future. We would either need to leave it > around forever, or we would have to risk unnecessarily breaking some > working backup scripts. I wonder if we should just remove it now and make > sure that this change is well-documented in the release notes. Personally, I am in favor of removing it. We change/rename functions/tables/views when we need to, and this happens in almost every release. What we need to do is make sure that an older installation won't silently work in a broken way, i.e. if we remove the exclusive flag somebody expecting the pre-9.6 behavior might not receive an error and think everything is OK. That would not be good. One option might be to rename the functions. Something like pg_backup_start/stop. Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Tue, Mar 01, 2022 at 08:44:51AM -0600, David Steele wrote: > Personally, I am in favor of removing it. We change/rename > functions/tables/views when we need to, and this happens in almost every > release. > > What we need to do is make sure that an older installation won't silently > work in a broken way, i.e. if we remove the exclusive flag somebody > expecting the pre-9.6 behavior might not receive an error and think > everything is OK. That would not be good. > > One option might be to rename the functions. Something like > pg_backup_start/stop. I'm fine with this approach. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/01/22 09:44, David Steele wrote: > Personally, I am in favor of removing it. We change/rename > functions/tables/views when we need to, and this happens in almost every > release. For clarification, is that a suggestion to remove the 'exclusive' parameter in some later release, after using this release to default it to false and reject calls with true? I can get behind that proposal, even if we don't have a practical way to add the warning I suggested. I'd be happier with the warning, but can live without it. Release notes can be the warning. That way, at least, there would be a period of time where procedures that currently work (by passing exclusive => false) would continue to work, and could be adapted as time permits by removing that argument, with no behavioral change. The later release removing the argument would then break only procedures that had never done so. That's comparable to what's proposed for this release, which will only break procedures that have never migrated away from exclusive mode despite the time and notice to do so. That seems ok to me. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Nathan Bossart (nathandbossart@gmail.com) wrote: > On Tue, Mar 01, 2022 at 08:44:51AM -0600, David Steele wrote: > > Personally, I am in favor of removing it. We change/rename > > functions/tables/views when we need to, and this happens in almost every > > release. > > > > What we need to do is make sure that an older installation won't silently > > work in a broken way, i.e. if we remove the exclusive flag somebody > > expecting the pre-9.6 behavior might not receive an error and think > > everything is OK. That would not be good. > > > > One option might be to rename the functions. Something like > > pg_backup_start/stop. > > I'm fine with this approach. +1. Thanks, Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote: > On 03/01/22 09:44, David Steele wrote: >> Personally, I am in favor of removing it. We change/rename >> functions/tables/views when we need to, and this happens in almost every >> release. > > For clarification, is that a suggestion to remove the 'exclusive' parameter > in some later release, after using this release to default it to false and > reject calls with true? My suggestion was to remove it in v15. My impression is that David and Stephen agree, but I could be misinterpreting their responses. > That way, at least, there would be a period of time where procedures > that currently work (by passing exclusive => false) would continue to work, > and could be adapted as time permits by removing that argument, with no > behavioral change. I'm not sure if there's any advantage to kicking the can down the road. At some point, we'll need to break existing backup scripts. Will we be more prepared to do that in v17 than we are now? We could maintain two sets of functions for a few releases and make it really clear in the documentation that pg_start/stop_backup() are going to be removed soon (and always emit a WARNING when they are used). Would that address your concerns? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 3/1/22 11:32, Nathan Bossart wrote: > On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote: >> On 03/01/22 09:44, David Steele wrote: >>> Personally, I am in favor of removing it. We change/rename >>> functions/tables/views when we need to, and this happens in almost every >>> release. >> >> For clarification, is that a suggestion to remove the 'exclusive' parameter >> in some later release, after using this release to default it to false and >> reject calls with true? > > My suggestion was to remove it in v15. My impression is that David and > Stephen agree, but I could be misinterpreting their responses. I agree and I'm pretty sure Stephen does as well. >> That way, at least, there would be a period of time where procedures >> that currently work (by passing exclusive => false) would continue to work, >> and could be adapted as time permits by removing that argument, with no >> behavioral change. > > I'm not sure if there's any advantage to kicking the can down the road. At > some point, we'll need to break existing backup scripts. Will we be more > prepared to do that in v17 than we are now? We could maintain two sets of > functions for a few releases and make it really clear in the documentation > that pg_start/stop_backup() are going to be removed soon (and always emit a > WARNING when they are used). Would that address your concerns? I think people are going to complain no matter what. If scripts are being maintained changing the name is not a big deal (though moving from exclusive to non-exclusive may be). If they aren't being maintained then they'll just blow up a few versions down the road when we remove the compatibility functions. Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/01/22 12:32, Nathan Bossart wrote: > On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote: >> That way, at least, there would be a period of time where procedures >> that currently work (by passing exclusive => false) would continue to work, >> and could be adapted as time permits by removing that argument, with no >> behavioral change. > > I'm not sure if there's any advantage to kicking the can down the road. At > some point, we'll need to break existing backup scripts. Will we be more > prepared to do that in v17 than we are now? Yes, if we have provided a transition period the way we did in 7117685. The way the on-ramp to that transition worked: - Initially, your procedures used exclusive mode. - If you changed nothing, they continued to work, with no behavior change. - You then had ample time to adapt them to non-exclusive mode so now they work that way. - You knew a day would come (here it comes) where, if you've never gotten around to doing that, your unchanged exclusive-mode procedures are going to break. So here we are, arrived at that day. If you're still using exclusive mode, your stuff's going to break; serves you right. So now limit the cases we care about to people who made use of the time they were given to change their procedures to use exclusive => false. So the off-ramp can look like: - Initially, your procedures pass exclusive => false. - If you change nothing, they should continue to work, with no behavior change. - You should then have ample time to change to a new spelling without exclusive => false, and have them work that way. - You know some later day is coming where, if you've never gotten around to doing that, they're going to break. Then, when that day comes, if you're still passing exclusive at all, your stuff's going to break; serves you right. If you have made use of the time you were given for the changes, you'll be fine. So yes, at that point, I think we can do it with clear conscience. We'll have made the off-ramp as smooth and navigable as the on-ramp was. > We could maintain two sets of > functions for a few releases and make it really clear in the documentation > that pg_start/stop_backup() are going to be removed soon (and always emit a > WARNING when they are used). Would that address your concerns? That would. I had been thinking of not changing the names, and just making the parameter go away. But I wasn't thinking of your concern here: > What we need to do is make sure that an older installation won't silently > work in a broken way, i.e. if we remove the exclusive flag somebody > expecting the pre-9.6 behavior might not receive an error and think > everything is OK. That would not be good. So I'm ok with changing the names. Then step 3 of the off-ramp would just be to call the functions by the new names, as well as to drop the exclusive => false. The thing I'd want to avoid is just, after the trouble that was taken to make the on-ramp navigable, making the off-ramp be a cliff. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/01/22 13:22, David Steele wrote: > I think people are going to complain no matter what. If scripts are being > maintained changing the name is not a big deal (though moving from exclusive > to non-exclusive may be). If they aren't being maintained then they'll just > blow up a few versions down the road when we remove the compatibility > functions. I might have already said enough in the message that crossed with this, but I think what I'm saying is there's a less-binary distinction between scripts that are/aren't "being maintained". There can't really be many teams out there thinking "we'll just ignore these scripts forever, and nothing bad will happen." They all know they'll have to do stuff sometimes. But it matters how we allow them to schedule it. In the on-ramp, at first there was only exclusive. Then there were both modes, with exclusive being deprecated, so teams knew they'd need to do stuff, and most by now probably have. They were able to do separation of hazards and schedule that work; they did not have to pile it onto the whole plate of "upgrade PG from 9.5 to 9.6 and make sure everything works". So now we're dropping the other shoe: first there was one mode, then both, now there's only the other one. Again there's some work for teams to do; let's again allow them to separate hazards and schedule that work apart from the whole 14 to 15 upgrade project. We can't help getting complaints in the off-ramp from anybody who ignored the on-ramp. But we can avoid clobbering the teams who dutifully played along before, and only want the same space to do so now. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * David Steele (david@pgmasters.net) wrote: > On 3/1/22 11:32, Nathan Bossart wrote: > >On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote: > >>On 03/01/22 09:44, David Steele wrote: > >>>Personally, I am in favor of removing it. We change/rename > >>>functions/tables/views when we need to, and this happens in almost every > >>>release. > >> > >>For clarification, is that a suggestion to remove the 'exclusive' parameter > >>in some later release, after using this release to default it to false and > >>reject calls with true? > > > >My suggestion was to remove it in v15. My impression is that David and > >Stephen agree, but I could be misinterpreting their responses. > > I agree and I'm pretty sure Stephen does as well. Yes, +1 to removing it. > >>That way, at least, there would be a period of time where procedures > >>that currently work (by passing exclusive => false) would continue to work, > >>and could be adapted as time permits by removing that argument, with no > >>behavioral change. > > > >I'm not sure if there's any advantage to kicking the can down the road. At > >some point, we'll need to break existing backup scripts. Will we be more > >prepared to do that in v17 than we are now? We could maintain two sets of > >functions for a few releases and make it really clear in the documentation > >that pg_start/stop_backup() are going to be removed soon (and always emit a > >WARNING when they are used). Would that address your concerns? > > I think people are going to complain no matter what. If scripts are being > maintained changing the name is not a big deal (though moving from exclusive > to non-exclusive may be). If they aren't being maintained then they'll just > blow up a few versions down the road when we remove the compatibility > functions. I don't consider "maintained" and "still using the exclusive backup method" to both be able to be true at the same time. Thanks, Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Chapman Flack (chap@anastigmatix.net) wrote: > On 03/01/22 13:22, David Steele wrote: > > I think people are going to complain no matter what. If scripts are being > > maintained changing the name is not a big deal (though moving from exclusive > > to non-exclusive may be). If they aren't being maintained then they'll just > > blow up a few versions down the road when we remove the compatibility > > functions. > > I might have already said enough in the message that crossed with this, > but I think what I'm saying is there's a less-binary distinction between > scripts that are/aren't "being maintained". > > There can't really be many teams out there thinking "we'll just ignore > these scripts forever, and nothing bad will happen." They all know they'll > have to do stuff sometimes. But it matters how we allow them to schedule it. We only make these changes between major versions. That's as much as we should be required to provide. Further, we seriously changed around how restores work a few versions back and there was rather little complaining. Thanks, Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/01/22 14:14, Stephen Frost wrote: >> There can't really be many teams out there thinking "we'll just ignore >> these scripts forever, and nothing bad will happen." They all know they'll >> have to do stuff sometimes. But it matters how we allow them to schedule it. > > We only make these changes between major versions. That's as much as we > should be required to provide. It's an OSS project, so I guess we're not required to provide anything. But in the course of this multi-release exclusive to non-exclusive transition, we already demonstrated, in 7117685, that we can avoid inflicting immediate breakage when there's nothing in our objective that inherently requires it, and avoiding it is relatively easy. I can't bring myself to think that was a bad precedent. Now, granted, the difference between the adaptations being required then and the ones required now is that those required both: changes to some function calls, and corresponding changes to how the scripts handled label and tablespace files. Here, it's only a clerical update to some function calls. So if I'm outvoted here and the reason is "look, a lighter burden is involved this time than that time", then ok. I would rather bow to that argument on the specific facts of one case than abandon the precedent from 7117685 generally. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Chapman Flack (chap@anastigmatix.net) wrote: > On 03/01/22 14:14, Stephen Frost wrote: > >> There can't really be many teams out there thinking "we'll just ignore > >> these scripts forever, and nothing bad will happen." They all know they'll > >> have to do stuff sometimes. But it matters how we allow them to schedule it. > > > > We only make these changes between major versions. That's as much as we > > should be required to provide. > > It's an OSS project, so I guess we're not required to provide anything. > > But in the course of this multi-release exclusive to non-exclusive > transition, we already demonstrated, in 7117685, that we can avoid > inflicting immediate breakage when there's nothing in our objective > that inherently requires it, and avoiding it is relatively easy. > > I can't bring myself to think that was a bad precedent. It's actively bad because we are ridiculously inconsistent when it comes to these things and we're terrible about ever removing anything once it's gotten into the tree as 'deprecated'. Witness that it's 8 years since 7117685 and we still have these old and clearly broken APIs around. We absolutely need to move *away* from this approach, exactly how 2dedf4d9, much more recently than 7117685, for all of its other flaws, did. > So if I'm outvoted here and the reason is "look, a lighter burden is > involved this time than that time", then ok. I would rather bow to that > argument on the specific facts of one case than abandon the precedent > from 7117685 generally. It's far from precedent- if anything, it's quite the opposite from how most changes around here are made, and much more recent commits in the same area clearly tossed out entirely the idea of trying to maintain some kind of backwards compatibility with existing scripts. Thanks, Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
Here is a new version of the patch with the following changes: 1. Addressed Chap's feedback from upthread. 2. Renamed pg_start/stop_backup() to pg_backup_start/stop() as suggested by David. 3. A couple of other small documentation adjustments. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/01/22 20:03, Nathan Bossart wrote: > Here is a new version of the patch with the following changes: I did not notice this earlier (sorry), but there seems to remain in backup.sgml a programlisting example that shows a psql invocation for pg_backup_start, then a tar command, then another psql invocation for pg_backup_stop. I think that was only workable for the exclusive mode, and now it is necessary to issue pg_backup_start and pg_backup_stop in the same session. (The 'touch backup_in_progress' business seems a bit bogus now too, suggesting an exclusivity remembered from bygone days.) I am not sure what a workable, simple example ought to look like. Maybe a single psql script issuing the pg_backup_start and the pg_backup_stop, with a tar command in between with \! ? Several bricks shy of production-ready, but it would give the idea. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Wed, Mar 02, 2022 at 02:23:51PM -0500, Chapman Flack wrote: > I did not notice this earlier (sorry), but there seems to remain in > backup.sgml a programlisting example that shows a psql invocation > for pg_backup_start, then a tar command, then another psql invocation > for pg_backup_stop. > > I think that was only workable for the exclusive mode, and now it is > necessary to issue pg_backup_start and pg_backup_stop in the same session. > > (The 'touch backup_in_progress' business seems a bit bogus now too, > suggesting an exclusivity remembered from bygone days.) > > I am not sure what a workable, simple example ought to look like. > Maybe a single psql script issuing the pg_backup_start and the > pg_backup_stop, with a tar command in between with \! ? > > Several bricks shy of production-ready, but it would give the idea. Another option might be to just remove this section. The top of the section mentions that this is easily done using pg_basebackup with the -X parameter. The bottom part of the section includes more complicated steps for when "more flexibility in copying the backup files is needed..." AFAICT the more complicated strategy was around before pg_basebackup, and the pg_basebackup recommendation was added in 2012 as part of 920febd. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 3/8/22 14:01, Nathan Bossart wrote: > On Wed, Mar 02, 2022 at 02:23:51PM -0500, Chapman Flack wrote: >> I did not notice this earlier (sorry), but there seems to remain in >> backup.sgml a programlisting example that shows a psql invocation >> for pg_backup_start, then a tar command, then another psql invocation >> for pg_backup_stop. >> >> I think that was only workable for the exclusive mode, and now it is >> necessary to issue pg_backup_start and pg_backup_stop in the same session. >> >> (The 'touch backup_in_progress' business seems a bit bogus now too, >> suggesting an exclusivity remembered from bygone days.) >> >> I am not sure what a workable, simple example ought to look like. >> Maybe a single psql script issuing the pg_backup_start and the >> pg_backup_stop, with a tar command in between with \! ? >> >> Several bricks shy of production-ready, but it would give the idea. > > Another option might be to just remove this section. The top of the > section mentions that this is easily done using pg_basebackup with the -X > parameter. The bottom part of the section includes more complicated steps > for when "more flexibility in copying the backup files is needed..." > AFAICT the more complicated strategy was around before pg_basebackup, and > the pg_basebackup recommendation was added in 2012 as part of 920febd. > Thoughts? This makes sense to me. I think pg_basebackup is far preferable to doing anything like what is described in this section. Unless you are planning to do something fancy (parallelization, snapshots, object stores, etc.) then pg_basebackup is the way to go. Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Tue, Mar 08, 2022 at 03:09:50PM -0600, David Steele wrote: > On 3/8/22 14:01, Nathan Bossart wrote: >> On Wed, Mar 02, 2022 at 02:23:51PM -0500, Chapman Flack wrote: >> > I did not notice this earlier (sorry), but there seems to remain in >> > backup.sgml a programlisting example that shows a psql invocation >> > for pg_backup_start, then a tar command, then another psql invocation >> > for pg_backup_stop. >> > >> > I think that was only workable for the exclusive mode, and now it is >> > necessary to issue pg_backup_start and pg_backup_stop in the same session. >> > >> > (The 'touch backup_in_progress' business seems a bit bogus now too, >> > suggesting an exclusivity remembered from bygone days.) >> > >> > I am not sure what a workable, simple example ought to look like. >> > Maybe a single psql script issuing the pg_backup_start and the >> > pg_backup_stop, with a tar command in between with \! ? >> > >> > Several bricks shy of production-ready, but it would give the idea. >> >> Another option might be to just remove this section. The top of the >> section mentions that this is easily done using pg_basebackup with the -X >> parameter. The bottom part of the section includes more complicated steps >> for when "more flexibility in copying the backup files is needed..." >> AFAICT the more complicated strategy was around before pg_basebackup, and >> the pg_basebackup recommendation was added in 2012 as part of 920febd. >> Thoughts? > > This makes sense to me. I think pg_basebackup is far preferable to doing > anything like what is described in this section. Unless you are planning to > do something fancy (parallelization, snapshots, object stores, etc.) then > pg_basebackup is the way to go. I spent some time trying to come up with a workable script to replace the existing one. I think the main problem is that you need to write out both the backup label file and the tablespace map file, but I didn't find an easy way to write the different output columns of pg_backup_stop() to separate files via psql. We'd probably need to write out the steps in prose like the 'Making a Base Backup Using the Low Level API' section does. Ultimately, I just removed everything beyond the pg_basebackup recommendation in the 'Standalone Hot Backups' section. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/08/22 17:12, Nathan Bossart wrote: > I spent some time trying to come up with a workable script to replace the > existing one. I think the main problem is that you need to write out both > the backup label file and the tablespace map file, but I didn't find an > easy way to write the different output columns of pg_backup_stop() to > separate files via psql. Something like this might work: SELECT * FROM pg_backup_stop(true) \gset \out /tmp/backup_label \qecho :labelfile \out /tmp/tablespace_map \qecho :spcmapfile \out \! ... tar command adding /tmp/{backup_label,tablespace_map} to the tarball I notice the \qecho adds a final newline (and so if :spcmapfile is empty, a file containing a single newline is made). In a quick test with a bogus restore_command, I did not see any error messages specific to the format of the backup_label or tablespace_map files, so maybe the final newline isn't a problem. Assuming the newline isn't a problem, that might be simple enough to use in an example, and maybe it's not a bad thing that it highlights a few psql capabilities the reader might not have stumbled on before. Or, maybe it is just too confusing to bother. While agreeing that pg_basebackup is the production-ready thing that does it all for you (with tests for likely errors and so on), I think there is also some value in a dead-simple example that concretely shows you what "it" is, what the basic steps are that happen beneath pg_basebackup's chrome. If the added newline is a problem, I haven't thought of a way to exclude it that doesn't take the example out of the realm of dead-simple. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Chapman Flack (chap@anastigmatix.net) wrote: > On 03/08/22 17:12, Nathan Bossart wrote: > > I spent some time trying to come up with a workable script to replace the > > existing one. I think the main problem is that you need to write out both > > the backup label file and the tablespace map file, but I didn't find an > > easy way to write the different output columns of pg_backup_stop() to > > separate files via psql. Let's not confuse ourselves here- the existing script *doesn't* work in any reasonable way when we're talking about everything that needs to be done to perform a backup. That a lot of people are using it because it's in the documentation is an actively bad thing. The same goes for the archive command example. > Something like this might work: > > SELECT * FROM pg_backup_stop(true) \gset > > \out /tmp/backup_label \qecho :labelfile > \out /tmp/tablespace_map \qecho :spcmapfile > \out > \! ... tar command adding /tmp/{backup_label,tablespace_map} to the tarball ... this doesn't do what's needed either. We could try to write down some minimum set of things that are needed for a backup tool to do but it's not something that a 3 line script is going to cover. Indeed, it's a lot more like pg_basebackup and if we want to give folks a script to use, it should be "run pg_basebackup". > I notice the \qecho adds a final newline (and so if :spcmapfile is empty, > a file containing a single newline is made). In a quick test with a bogus > restore_command, I did not see any error messages specific to the format > of the backup_label or tablespace_map files, so maybe the final newline > isn't a problem. > > Assuming the newline isn't a problem, that might be simple enough to > use in an example, and maybe it's not a bad thing that it highlights a few > psql capabilities the reader might not have stumbled on before. Or, maybe > it is just too confusing to bother. It's more than just too confusing, it's actively bad because people will actually use it and then end up with backups that don't work. > While agreeing that pg_basebackup is the production-ready thing that > does it all for you (with tests for likely errors and so on), I think > there is also some value in a dead-simple example that concretely > shows you what "it" is, what the basic steps are that happen beneath > pg_basebackup's chrome. Documenting everything that pg_basebackup does to make sure that the backup is viable might be something to work on if someone is really excited about this, but it's not 'dead-simple' and it's darn close to the bare minimum, something that none of these simple scripts will come anywhere close to being and instead they'll be far less than the minimum. > If the added newline is a problem, I haven't thought of a way to exclude > it that doesn't take the example out of the realm of dead-simple. I disagree that there's really a way to provide 'dead-simple' backups with what's built into core without using pg_basebackup. If we want a 'dead-simple' solution in core then we'd need to write an appropriate backup tool that does all the basic things and include and maintain that. Writing a shell script isn't enough and we shouldn't encourage our users to do exactly that by having it in our documentation because then they'll think it's enough. Thanks, Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Magnus Hagander
Date:
On Wed, Mar 9, 2022 at 4:42 PM Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Chapman Flack (chap@anastigmatix.net) wrote: > > On 03/08/22 17:12, Nathan Bossart wrote: > > > I spent some time trying to come up with a workable script to replace the > > > existing one. I think the main problem is that you need to write out both > > > the backup label file and the tablespace map file, but I didn't find an > > > easy way to write the different output columns of pg_backup_stop() to > > > separate files via psql. > > Let's not confuse ourselves here- the existing script *doesn't* work in > any reasonable way when we're talking about everything that needs to be > done to perform a backup. That a lot of people are using it because > it's in the documentation is an actively bad thing. > > The same goes for the archive command example. > > > Something like this might work: > > > > SELECT * FROM pg_backup_stop(true) \gset > > > > \out /tmp/backup_label \qecho :labelfile > > \out /tmp/tablespace_map \qecho :spcmapfile > > \out > > \! ... tar command adding /tmp/{backup_label,tablespace_map} to the tarball > > ... this doesn't do what's needed either. We could try to write down > some minimum set of things that are needed for a backup tool to do but > it's not something that a 3 line script is going to cover. Indeed, it's > a lot more like pg_basebackup and if we want to give folks a script to > use, it should be "run pg_basebackup". > > > I notice the \qecho adds a final newline (and so if :spcmapfile is empty, > > a file containing a single newline is made). In a quick test with a bogus > > restore_command, I did not see any error messages specific to the format > > of the backup_label or tablespace_map files, so maybe the final newline > > isn't a problem. > > > > Assuming the newline isn't a problem, that might be simple enough to > > use in an example, and maybe it's not a bad thing that it highlights a few > > psql capabilities the reader might not have stumbled on before. Or, maybe > > it is just too confusing to bother. > > It's more than just too confusing, it's actively bad because people will > actually use it and then end up with backups that don't work. +1. Or even worse, backups that sometimes work, but not reliably and not every time. > > While agreeing that pg_basebackup is the production-ready thing that > > does it all for you (with tests for likely errors and so on), I think > > there is also some value in a dead-simple example that concretely > > shows you what "it" is, what the basic steps are that happen beneath > > pg_basebackup's chrome. I agree that having a dead simple script would be good. The *only* dead simple script that's going to be possible is one that calls pg_basebackup. The current APIs don't make it *possible* to drive them directly with a dead simple script. Pretending something is simple when it's not, is not doing anybody a favor. > Documenting everything that pg_basebackup does to make sure that the > backup is viable might be something to work on if someone is really > excited about this, but it's not 'dead-simple' and it's darn close to > the bare minimum, something that none of these simple scripts will come > anywhere close to being and instead they'll be far less than the > minimum. Yeah, having the full set of steps required documented certainly wouldn't be a bad thing. But it's a very *different* thing. > > If the added newline is a problem, I haven't thought of a way to exclude > > it that doesn't take the example out of the realm of dead-simple. > > I disagree that there's really a way to provide 'dead-simple' backups > with what's built into core without using pg_basebackup. If we want a > 'dead-simple' solution in core then we'd need to write an appropriate > backup tool that does all the basic things and include and maintain > that. Writing a shell script isn't enough and we shouldn't encourage > our users to do exactly that by having it in our documentation because > then they'll think it's enough. +1. We need to accept that the current APIs are far too low level to be driven by a shellscript. No matter how much documentation we write is not going to change that fact. For the people who want to drive their backups from a shellscript and for some reason *don't* want to use pg_basebackup, we need to come up with a different API or a different set of tools. That is not a documentation task. That is a "start from a list of which things pg_basebackup cannot do that are still simple, or that tools like pgbackrest cannot do if they're complicated". And then design an API that's actually safe and easy to use *for that usecase*. For example, if the use case is "i want to use filesystemor SAN snapshots for my backups", we shouldn't try to write workarounds using bash coprocs or whatever. Instead, we could write a tool that interacts with the current api to start the backup, then launches a shellscript that interacts with the snapshot system, and then stops the backup after. With a well defined set of rules for how that shell script should work and interact. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/09/22 11:22, Magnus Hagander wrote: >> It's more than just too confusing, it's actively bad because people will >> actually use it and then end up with backups that don't work. > > +1. > > Or even worse, backups that sometimes work, but not reliably and not > every time. > ... > Pretending something is simple when it's not, is not doing anybody a favor. Okay, I bow to this reasoning, for the purpose of this patch. Let's just lose the example. >> Documenting everything that pg_basebackup does to make sure that the >> backup is viable might be something to work on if someone is really >> excited about this, but it's not 'dead-simple' and it's darn close to >> the bare minimum, something that none of these simple scripts will come >> anywhere close to being and instead they'll be far less than the >> minimum. > > Yeah, having the full set of steps required documented certainly > wouldn't be a bad thing. I'd say that qualifies as an understatement. While it certainly doesn't have to be part of this patch, if the claim is that an admin who relies on pg_basebackup is relying on essential things pg_basebackup does that have not been enumerated in our documentation yet, I would argue they should be. > with a different API or a different set of tools. That is not a > documentation task. That is a "start from a list of which things > pg_basebackup cannot do that are still simple, or that tools like > pgbackrest cannot do if they're complicated". And then design an API > that's actually safe and easy to use *for that usecase*. That might also be a good thing, but I don't see it as a substitute for documenting the present reality of what the irreducibly essential behaviors of pg_basebackup (or of third-party tools like pgbackrest) are, and why they are so. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Chapman Flack (chap@anastigmatix.net) wrote: > On 03/09/22 11:22, Magnus Hagander wrote: > >> It's more than just too confusing, it's actively bad because people will > >> actually use it and then end up with backups that don't work. > > > > +1. > > > > Or even worse, backups that sometimes work, but not reliably and not > > every time. > > ... > > Pretending something is simple when it's not, is not doing anybody a favor. > > Okay, I bow to this reasoning, for the purpose of this patch. Let's > just lose the example. Great. > >> Documenting everything that pg_basebackup does to make sure that the > >> backup is viable might be something to work on if someone is really > >> excited about this, but it's not 'dead-simple' and it's darn close to > >> the bare minimum, something that none of these simple scripts will come > >> anywhere close to being and instead they'll be far less than the > >> minimum. > > > > Yeah, having the full set of steps required documented certainly > > wouldn't be a bad thing. > > I'd say that qualifies as an understatement. While it certainly doesn't > have to be part of this patch, if the claim is that an admin who relies > on pg_basebackup is relying on essential things pg_basebackup does that > have not been enumerated in our documentation yet, I would argue they > should be. It doesn't have to be part of this patch and we should move forward with this patch. Let's avoid hijacking this thread, which is about this patch, for an independent debate about what our documentation should or shouldn't include. > > with a different API or a different set of tools. That is not a > > documentation task. That is a "start from a list of which things > > pg_basebackup cannot do that are still simple, or that tools like > > pgbackrest cannot do if they're complicated". And then design an API > > that's actually safe and easy to use *for that usecase*. > > That might also be a good thing, but I don't see it as a substitute > for documenting the present reality of what the irreducibly essential > behaviors of pg_basebackup (or of third-party tools like pgbackrest) > are, and why they are so. I disagree. If we provided a tool then we'd document that tool and how users can use it, not every single step that it does (see also: pg_basebackup). Thanks, Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/09/22 12:19, Stephen Frost wrote: > Let's avoid hijacking this thread, which is about this > patch, for an independent debate about what our documentation should or > shouldn't include. Agreed. New thread here: https://www.postgresql.org/message-id/6228FFE4.3050309%40anastigmatix.net Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Wed, Mar 09, 2022 at 02:32:24PM -0500, Chapman Flack wrote: > On 03/09/22 12:19, Stephen Frost wrote: >> Let's avoid hijacking this thread, which is about this >> patch, for an independent debate about what our documentation should or >> shouldn't include. > > Agreed. New thread here: > > https://www.postgresql.org/message-id/6228FFE4.3050309%40anastigmatix.net Great. Is there any additional feedback on this patch? Should we add an example of using pg_basebackup in the "Standalone Hot Backups" section, or should we leave all documentation additions like this for Chap's new thread? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/09/22 17:21, Nathan Bossart wrote: > Great. Is there any additional feedback on this patch? Should we add an > example of using pg_basebackup in the "Standalone Hot Backups" section, or > should we leave all documentation additions like this for Chap's new > thread? I'm composing something longer for the new thread, but on the way I noticed something we might fit into this one. I think the listitem In the same connection as before, issue the command: SELECT * FROM pg_backup_stop(true); would be clearer if it used named-parameter form, wait_for_archive => true. This is not strictly necessary, of course, for a function with a single IN parameter, but it's good documentation (and also could save us headaches like these if there is ever another future need to give it more parameters). That listitem doesn't say anything about what the parameter means, which is a little weird, but probably ok because the next listitem does go into it in some detail. I don't think a larger reorg is needed to bring that language one listitem earlier. Just naming the parameter is probably enough to make it less puzzling (or adding in that listitem, at most, "the effect of the wait_for_archive parameter is explained below"). For consistency (and the same futureproofing benefit), I'd go to fast => false in the earlier pg_backup_start as well. I'm more ambivalent about label => 'label'. It would be consistent, but should we just agree for conciseness that there will always be a label and it will always be first? You can pretty much tell in a call what's a label; it's those anonymous trues and falses that are easier to read with named notation. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Wed, Mar 09, 2022 at 06:11:19PM -0500, Chapman Flack wrote: > I think the listitem > > In the same connection as before, issue the command: > SELECT * FROM pg_backup_stop(true); > > would be clearer if it used named-parameter form, wait_for_archive => true. > > This is not strictly necessary, of course, for a function with a single > IN parameter, but it's good documentation (and also could save us headaches > like these if there is ever another future need to give it more parameters). > > That listitem doesn't say anything about what the parameter means, which > is a little weird, but probably ok because the next listitem does go into > it in some detail. I don't think a larger reorg is needed to bring that > language one listitem earlier. Just naming the parameter is probably > enough to make it less puzzling (or adding in that listitem, at most, > "the effect of the wait_for_archive parameter is explained below"). > > For consistency (and the same futureproofing benefit), I'd go to > fast => false in the earlier pg_backup_start as well. > > I'm more ambivalent about label => 'label'. It would be consistent, > but should we just agree for conciseness that there will always be > a label and it will always be first? > > You can pretty much tell in a call what's a label; it's those anonymous > trues and falses that are easier to read with named notation. Done. I went ahead and added "label => 'label'" for consistency. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 3/9/22 18:06, Nathan Bossart wrote: > On Wed, Mar 09, 2022 at 06:11:19PM -0500, Chapman Flack wrote: >> I think the listitem >> >> In the same connection as before, issue the command: >> SELECT * FROM pg_backup_stop(true); >> >> would be clearer if it used named-parameter form, wait_for_archive => true. >> >> This is not strictly necessary, of course, for a function with a single >> IN parameter, but it's good documentation (and also could save us headaches >> like these if there is ever another future need to give it more parameters). >> >> That listitem doesn't say anything about what the parameter means, which >> is a little weird, but probably ok because the next listitem does go into >> it in some detail. I don't think a larger reorg is needed to bring that >> language one listitem earlier. Just naming the parameter is probably >> enough to make it less puzzling (or adding in that listitem, at most, >> "the effect of the wait_for_archive parameter is explained below"). >> >> For consistency (and the same futureproofing benefit), I'd go to >> fast => false in the earlier pg_backup_start as well. >> >> I'm more ambivalent about label => 'label'. It would be consistent, >> but should we just agree for conciseness that there will always be >> a label and it will always be first? >> >> You can pretty much tell in a call what's a label; it's those anonymous >> trues and falses that are easier to read with named notation. > > Done. I went ahead and added "label => 'label'" for consistency. +1 from me. Regards, -David
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/09/22 19:06, Nathan Bossart wrote: > Done. I went ahead and added "label => 'label'" for consistency. Looks like this change to an example in func.sgml is not quite right: -postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); +postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop()); pg_backup_stop returns a record now, not just lsn. So this works for me: +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn); Otherwise, all looks to be in good order. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Thu, Mar 10, 2022 at 07:13:14PM -0500, Chapman Flack wrote: > Looks like this change to an example in func.sgml is not quite right: > > -postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); > +postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop()); > > pg_backup_stop returns a record now, not just lsn. So this works for me: > > +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn); Ah, good catch. I made this change in v7. I considered doing something like this SELECT w.* FROM pg_backup_stop() b, pg_walfile_name_offset(b.lsn) w; but I think your suggestion is simpler. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Chapman Flack
Date:
On 03/10/22 19:38, Nathan Bossart wrote: > On Thu, Mar 10, 2022 at 07:13:14PM -0500, Chapman Flack wrote: >> +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn); > > Ah, good catch. I made this change in v7. I considered doing something v7 looks good to me. I have repeated the installcheck-world and given the changed documentation sections one last read-through, and will change the status to RfC. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Fri, Mar 11, 2022 at 02:00:37PM -0500, Chapman Flack wrote: > v7 looks good to me. I have repeated the installcheck-world and given > the changed documentation sections one last read-through, and will > change the status to RfC. Thanks for reviewing! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Nathan Bossart (nathandbossart@gmail.com) wrote: > On Thu, Mar 10, 2022 at 07:13:14PM -0500, Chapman Flack wrote: > > Looks like this change to an example in func.sgml is not quite right: > > > > -postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); > > +postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop()); > > > > pg_backup_stop returns a record now, not just lsn. So this works for me: > > > > +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn); > > Ah, good catch. I made this change in v7. I considered doing something > like this > > SELECT w.* FROM pg_backup_stop() b, pg_walfile_name_offset(b.lsn) w; > > but I think your suggestion is simpler. I tend to agree. More below. > Subject: [PATCH v7 1/1] remove exclusive backup mode > diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml > index 0d69851bb1..c8b914c1aa 100644 > --- a/doc/src/sgml/backup.sgml > +++ b/doc/src/sgml/backup.sgml > @@ -881,19 +873,19 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0 > <listitem> > <para> > Connect to the server (it does not matter which database) as a user with > - rights to run pg_start_backup (superuser, or a user who has been granted > + rights to run pg_backup_start (superuser, or a user who has been granted > EXECUTE on the function) and issue the command: > <programlisting> > -SELECT pg_start_backup('label', false, false); > +SELECT pg_backup_start(label => 'label', fast => false); > </programlisting> > where <literal>label</literal> is any string you want to use to uniquely > identify this backup operation. The connection > - calling <function>pg_start_backup</function> must be maintained until the end of > + calling <function>pg_backup_start</function> must be maintained until the end of > the backup, or the backup will be automatically aborted. > </para> > > <para> > - By default, <function>pg_start_backup</function> can take a long time to finish. > + By default, <function>pg_backup_start</function> can take a long time to finish. > This is because it performs a checkpoint, and the I/O > required for the checkpoint will be spread out over a significant > period of time, by default half your inter-checkpoint interval Hrmpf. Not this patch's fault, but the default isn't 'half your inter-checkpoint interval' anymore (checkpoint_completion_target was changed to 0.9 ... let's not discuss who did that and missed fixing this). Overall though, maybe we should reword this to avoid having to remember to change it again if we ever change the completion target again? Also it seems to imply that pg_backup_start is going to run its own independent checkpoint or something, which isn't the typical case. I generally explain what is going on here like this: By default, <function>pg_backup_start</function> will wait for the next checkpoint to complete (see ref: checkpoint_timeout ... maybe also wal-configuration.html). > @@ -937,7 +925,7 @@ SELECT * FROM pg_stop_backup(false, true); > ready to archive. > </para> > <para> > - The <function>pg_stop_backup</function> will return one row with three > + The <function>pg_backup_stop</function> will return one row with three > values. The second of these fields should be written to a file named > <filename>backup_label</filename> in the root directory of the backup. The > third field should be written to a file named I get that we have <function> and </function>, but those are just formatting and don't show up in the actual text, and so it ends up being: The pg_backup_stop will return Which doesn't sound quite right to me. I'd say we either drop 'The' or add 'function' after. I realize that this patch is mostly doing a s/pg_stop_backup/pg_backup_stop/, but, still. > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml > index 8a802fb225..73096708cc 100644 > --- a/doc/src/sgml/func.sgml > +++ b/doc/src/sgml/func.sgml > @@ -25732,24 +25715,19 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 > <parameter>spcmapfile</parameter> <type>text</type> ) > </para> > <para> > - Finishes performing an exclusive or non-exclusive on-line backup. > - The <parameter>exclusive</parameter> parameter must match the > - previous <function>pg_start_backup</function> call. > - In an exclusive backup, <function>pg_stop_backup</function> removes > - the backup label file and, if it exists, the tablespace map file > - created by <function>pg_start_backup</function>. In a non-exclusive > - backup, the desired contents of these files are returned as part of > + Finishes performing an on-line backup. The desired contents of the > + backup label file and the tablespace map file are returned as part of > the result of the function, and should be written to files in the > backup area (not in the data directory). > </para> Given that this is a crucial part, which the exclusive mode has wrong, I'd be a bit more forceful about it, eg: The contents of the backup label file and the tablespace map file must be stored as part of the backup and must NOT be stored in the live data directory (doing so will cause PostgreSQL to fail to restart in the event of a crash). > @@ -25771,8 +25749,7 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 > The result of the function is a single record. > The <parameter>lsn</parameter> column holds the backup's ending > write-ahead log location (which again can be ignored). The second and > - third columns are <literal>NULL</literal> when ending an exclusive > - backup; after a non-exclusive backup they hold the desired contents of > + third columns hold the desired contents of > the label and tablespace map files. > </para> > <para> I dislike saying 'desired' here as if it's something that we're nicely asking but maybe isn't a big deal, and just saying 'label' isn't good since we call it 'backup label' elsewhere and we should try hard to be consistent and clear. How about: The second column returns the contents of the backup label file, the third column returns the contents of the tablespace map file. These must be stored as part of the backup and are required as part of the restore process. > diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c > index 3c182c97d4..ee3fa148b6 100644 > --- a/src/bin/pg_ctl/pg_ctl.c > +++ b/src/bin/pg_ctl/pg_ctl.c > @@ -1069,7 +1069,7 @@ do_stop(void) > get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY) > { > print_msg(_("WARNING: online backup mode is active\n" > - "Shutdown will not complete until pg_stop_backup() is called.\n\n")); > + "Shutdown will not complete until pg_backup_stop() is called.\n\n")); > } > > print_msg(_("waiting for server to shut down...")); > @@ -1145,7 +1145,7 @@ do_restart(void) > get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY) > { > print_msg(_("WARNING: online backup mode is active\n" > - "Shutdown will not complete until pg_stop_backup() is called.\n\n")); > + "Shutdown will not complete until pg_backup_stop() is called.\n\n")); > } > > print_msg(_("waiting for server to shut down...")); This... can't actually happen anymore, right? Shouldn't we just pull all of this out? On a once-over of the rest of the code, I definitely like how much we're able to simplify things in this area and remove various hacks in things like pg_basebackup and pg_rewind where we previously had to worry about backup_label and tablespace_map files being in a live data directory. I'm planning to spend more time on this and hopefully be able to get it in for v15. Thanks! Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Mon, Mar 28, 2022 at 04:30:27PM -0400, Stephen Frost wrote: >> - By default, <function>pg_start_backup</function> can take a long time to finish. >> + By default, <function>pg_backup_start</function> can take a long time to finish. >> This is because it performs a checkpoint, and the I/O >> required for the checkpoint will be spread out over a significant >> period of time, by default half your inter-checkpoint interval > > Hrmpf. Not this patch's fault, but the default isn't 'half your > inter-checkpoint interval' anymore (checkpoint_completion_target was > changed to 0.9 ... let's not discuss who did that and missed fixing > this). Overall though, maybe we should reword this to avoid having to > remember to change it again if we ever change the completion target > again? Also it seems to imply that pg_backup_start is going to run its > own independent checkpoint or something, which isn't the typical case. > I generally explain what is going on here like this: > > By default, <function>pg_backup_start</function> will wait for the next > checkpoint to complete (see ref: checkpoint_timeout ... maybe also > wal-configuration.html). done >> - The <function>pg_stop_backup</function> will return one row with three >> + The <function>pg_backup_stop</function> will return one row with three >> values. The second of these fields should be written to a file named >> <filename>backup_label</filename> in the root directory of the backup. The >> third field should be written to a file named > > I get that we have <function> and </function>, but those are just > formatting and don't show up in the actual text, and so it ends up > being: > > The pg_backup_stop will return > > Which doesn't sound quite right to me. I'd say we either drop 'The' or > add 'function' after. I realize that this patch is mostly doing a > s/pg_stop_backup/pg_backup_stop/, but, still. done >> - Finishes performing an exclusive or non-exclusive on-line backup. >> - The <parameter>exclusive</parameter> parameter must match the >> - previous <function>pg_start_backup</function> call. >> - In an exclusive backup, <function>pg_stop_backup</function> removes >> - the backup label file and, if it exists, the tablespace map file >> - created by <function>pg_start_backup</function>. In a non-exclusive >> - backup, the desired contents of these files are returned as part of >> + Finishes performing an on-line backup. The desired contents of the >> + backup label file and the tablespace map file are returned as part of >> the result of the function, and should be written to files in the >> backup area (not in the data directory). >> </para> > > Given that this is a crucial part, which the exclusive mode has wrong, > I'd be a bit more forceful about it, eg: > > The contents of the backup label file and the tablespace map file must > be stored as part of the backup and must NOT be stored in the live data > directory (doing so will cause PostgreSQL to fail to restart in the > event of a crash). done >> The result of the function is a single record. >> The <parameter>lsn</parameter> column holds the backup's ending >> write-ahead log location (which again can be ignored). The second and >> - third columns are <literal>NULL</literal> when ending an exclusive >> - backup; after a non-exclusive backup they hold the desired contents of >> + third columns hold the desired contents of >> the label and tablespace map files. >> </para> >> <para> > > I dislike saying 'desired' here as if it's something that we're nicely > asking but maybe isn't a big deal, and just saying 'label' isn't good > since we call it 'backup label' elsewhere and we should try hard to be > consistent and clear. How about: > > The second column returns the contents of the backup label file, the > third column returns the contents of the tablespace map file. These > must be stored as part of the backup and are required as part of the > restore process. done >> { >> print_msg(_("WARNING: online backup mode is active\n" >> - "Shutdown will not complete until pg_stop_backup() is called.\n\n")); >> + "Shutdown will not complete until pg_backup_stop() is called.\n\n")); >> } >> >> print_msg(_("waiting for server to shut down...")); >> @@ -1145,7 +1145,7 @@ do_restart(void) >> get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY) >> { >> print_msg(_("WARNING: online backup mode is active\n" >> - "Shutdown will not complete until pg_stop_backup() is called.\n\n")); >> + "Shutdown will not complete until pg_backup_stop() is called.\n\n")); >> } >> >> print_msg(_("waiting for server to shut down...")); > > This... can't actually happen anymore, right? Shouldn't we just pull > all of this out? done > On a once-over of the rest of the code, I definitely like how much we're > able to simplify things in this area and remove various hacks in things > like pg_basebackup and pg_rewind where we previously had to worry about > backup_label and tablespace_map files being in a live data directory. > I'm planning to spend more time on this and hopefully be able to get it > in for v15. Great! Much appreciated. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 3/28/22 10:09 PM, Nathan Bossart wrote: > On Mon, Mar 28, 2022 at 04:30:27PM -0400, Stephen Frost wrote: > >> On a once-over of the rest of the code, I definitely like how much we're >> able to simplify things in this area and remove various hacks in things >> like pg_basebackup and pg_rewind where we previously had to worry about >> backup_label and tablespace_map files being in a live data directory. >> I'm planning to spend more time on this and hopefully be able to get it >> in for v15. > > Great! Much appreciated. Minor typo in the docs: + * capable of doing an online backup, but exclude then just in case. Should be: capable of doing an online backup, but exclude them just in case. Regards, -- -David david@pgmasters.net
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Mon, Apr 04, 2022 at 09:56:26AM -0400, David Steele wrote: > Minor typo in the docs: > > + * capable of doing an online backup, but exclude then just in case. > > Should be: > > capable of doing an online backup, but exclude them just in case. fixed -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
I noticed a couple of other things that can be removed. Since we no longer wait on exclusive backup mode during smart shutdown, we can change connsAllowed (in postmaster.c) to a boolean and remove CAC_SUPERUSER. We can also remove a couple of related notes in the documentation. I've done all this in the attached patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 4/4/22 11:42 AM, Nathan Bossart wrote: > I noticed a couple of other things that can be removed. Since we no longer > wait on exclusive backup mode during smart shutdown, we can change > connsAllowed (in postmaster.c) to a boolean and remove CAC_SUPERUSER. We > can also remove a couple of related notes in the documentation. I've done > all this in the attached patch. These changes look good to me. IMV it is a real bonus how much the state machine has been simplified. I've also run this patch through the pgbackrest regression tests without any problems. Regards, -- -David david@pgmasters.net
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * David Steele (david@pgmasters.net) wrote: > On 4/4/22 11:42 AM, Nathan Bossart wrote: > >I noticed a couple of other things that can be removed. Since we no longer > >wait on exclusive backup mode during smart shutdown, we can change > >connsAllowed (in postmaster.c) to a boolean and remove CAC_SUPERUSER. We > >can also remove a couple of related notes in the documentation. I've done > >all this in the attached patch. > > These changes look good to me. IMV it is a real bonus how much the state > machine has been simplified. Yeah, agreed. > I've also run this patch through the pgbackrest regression tests without any > problems. Fantastic. Please find attached an updated patch + commit message. Mostly, I just went through and did a bit more in terms of updating the documentation and improving the comments (there were some places that were still worrying about the chance of a 'stray' backup_label file existing, which isn't possible any longer), along with some additional testing and review. This is looking pretty good to me, but other thoughts are certainly welcome. Otherwise, I'm hoping to commit this tomorrow. Thanks! Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
David Steele
Date:
On 4/5/22 11:25 AM, Stephen Frost wrote: > > Please find attached an updated patch + commit message. Mostly, I just > went through and did a bit more in terms of updating the documentation > and improving the comments (there were some places that were still > worrying about the chance of a 'stray' backup_label file existing, which > isn't possible any longer), along with some additional testing and > review. This is looking pretty good to me, but other thoughts are > certainly welcome. Otherwise, I'm hoping to commit this tomorrow. I have reviewed the changes and they look good. I also ran the new patch through pgbackrest regression with no issues. Regards, -- -David david@pgmasters.net
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Magnus Hagander
Date:
On Tue, Apr 5, 2022 at 5:25 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* David Steele (david@pgmasters.net) wrote:
> On 4/4/22 11:42 AM, Nathan Bossart wrote:
> >I noticed a couple of other things that can be removed. Since we no longer
> >wait on exclusive backup mode during smart shutdown, we can change
> >connsAllowed (in postmaster.c) to a boolean and remove CAC_SUPERUSER. We
> >can also remove a couple of related notes in the documentation. I've done
> >all this in the attached patch.
>
> These changes look good to me. IMV it is a real bonus how much the state
> machine has been simplified.
Yeah, agreed.
Definitely.
> I've also run this patch through the pgbackrest regression tests without any
> problems.
Fantastic.
Please find attached an updated patch + commit message. Mostly, I just
went through and did a bit more in terms of updating the documentation
and improving the comments (there were some places that were still
worrying about the chance of a 'stray' backup_label file existing, which
isn't possible any longer), along with some additional testing and
review. This is looking pretty good to me, but other thoughts are
certainly welcome. Otherwise, I'm hoping to commit this tomorrow.
+1. LGTM.
I'm not sure I love the renaming of the functions, but I have also yet to come up with a better idea for how to avoid silent breakage, so go with it.
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Tue, Apr 05, 2022 at 11:25:36AM -0400, Stephen Frost wrote: > Please find attached an updated patch + commit message. Mostly, I just > went through and did a bit more in terms of updating the documentation > and improving the comments (there were some places that were still > worrying about the chance of a 'stray' backup_label file existing, which > isn't possible any longer), along with some additional testing and > review. This is looking pretty good to me, but other thoughts are > certainly welcome. Otherwise, I'm hoping to commit this tomorrow. LGTM! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Laurenz Albe
Date:
On Tue, 2022-04-05 at 13:06 -0700, Nathan Bossart wrote: > On Tue, Apr 05, 2022 at 11:25:36AM -0400, Stephen Frost wrote: > > Please find attached an updated patch + commit message. Mostly, I just > > went through and did a bit more in terms of updating the documentation > > and improving the comments (there were some places that were still > > worrying about the chance of a 'stray' backup_label file existing, which > > isn't possible any longer), along with some additional testing and > > review. This is looking pretty good to me, but other thoughts are > > certainly welcome. Otherwise, I'm hoping to commit this tomorrow. > > LGTM! Cassandra (not the software) from the sidelines predicts that we will get some fire from users for this, although I concede the theoretical sanity of the change. Yours, Laurenz Albe
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Laurenz Albe (laurenz.albe@cybertec.at) wrote: > On Tue, 2022-04-05 at 13:06 -0700, Nathan Bossart wrote: > > On Tue, Apr 05, 2022 at 11:25:36AM -0400, Stephen Frost wrote: > > > Please find attached an updated patch + commit message. Mostly, I just > > > went through and did a bit more in terms of updating the documentation > > > and improving the comments (there were some places that were still > > > worrying about the chance of a 'stray' backup_label file existing, which > > > isn't possible any longer), along with some additional testing and > > > review. This is looking pretty good to me, but other thoughts are > > > certainly welcome. Otherwise, I'm hoping to commit this tomorrow. > > > > LGTM! > > Cassandra (not the software) from the sidelines predicts that we will > get some fire from users for this, although I concede the theoretical > sanity of the change. Great, thanks for that. This has now been committed- thanks again to everyone for their help! Stephen
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Nathan Bossart
Date:
On Wed, Apr 06, 2022 at 03:29:15PM -0400, Stephen Frost wrote: > This has now been committed- thanks again to everyone for their help! Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Martín Marqués
Date:
Hi, I was looking at the commit for this patch and noticed this small typo in the comment in `basebackup.c`: ``` diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 6884cad2c00af1632eacec07903098e7e1874393..815681ada7dd0c135af584ad5da2dd13c9a12465 100644 (file) --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -184,10 +184,8 @@ static const struct exclude_list_item excludeFiles[] = {RELCACHE_INIT_FILENAME, true}, /* - * If there's a backup_label or tablespace_map file, it belongs to a - * backup started by the user with pg_start_backup(). It is *not* correct - * for this backup. Our backup_label/tablespace_map is injected into the - * tar separately. + * backup_label and tablespace_map should not exist in in a running cluster + * capable of doing an online backup, but exclude them just in case. */ {BACKUP_LABEL_FILE, false}, {TABLESPACE_MAP, false}, ``` The typo is in `exist in in a running cluster`. There's two `in` in a row. P.D.: I was looking at this just because I was looking at an issue where someone bumped their head with this "problem", so great that we're in a better place now. Hopefully one day everyone will be running PG15 or better :) Kind regards, Martín El mié, 6 abr 2022 a las 16:29, Stephen Frost (<sfrost@snowman.net>) escribió: > > Greetings, > > * Laurenz Albe (laurenz.albe@cybertec.at) wrote: > > On Tue, 2022-04-05 at 13:06 -0700, Nathan Bossart wrote: > > > On Tue, Apr 05, 2022 at 11:25:36AM -0400, Stephen Frost wrote: > > > > Please find attached an updated patch + commit message. Mostly, I just > > > > went through and did a bit more in terms of updating the documentation > > > > and improving the comments (there were some places that were still > > > > worrying about the chance of a 'stray' backup_label file existing, which > > > > isn't possible any longer), along with some additional testing and > > > > review. This is looking pretty good to me, but other thoughts are > > > > certainly welcome. Otherwise, I'm hoping to commit this tomorrow. > > > > > > LGTM! > > > > Cassandra (not the software) from the sidelines predicts that we will > > get some fire from users for this, although I concede the theoretical > > sanity of the change. > > Great, thanks for that. > > This has now been committed- thanks again to everyone for their help! > > Stephen -- Martín Marqués It’s not that I have something to hide, it’s that I have nothing I want you to see
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Michael Paquier
Date:
On Tue, Apr 19, 2022 at 10:12:32PM -0300, Martín Marqués wrote: > The typo is in `exist in in a running cluster`. There's two `in` in a row. Thanks, fixed. -- Michael
Attachment
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
From
Stephen Frost
Date:
Greetings, * Martín Marqués (martin.marques@gmail.com) wrote: > The typo is in `exist in in a running cluster`. There's two `in` in a row. Oops, thanks for catching (and thanks to Michael for committing the fix!). > P.D.: I was looking at this just because I was looking at an issue > where someone bumped their head with this "problem", so great that > we're in a better place now. Hopefully one day everyone will be > running PG15 or better :) Agreed! Does make me wonder just how often folks run into this issue.. Glad that we were able to get the change in for v15. Thanks again! Stephen