Thread: Add exclusive backup deprecation notes to documentation
Hackers, It has been noted on multiple threads, such as [1], that it would be good to have additional notes in the documentation to explain why exclusive backups have been deprecated and why they should be avoided when possible. This patch attempts to document the limitations of the exclusive mode. -- -David david@pgmasters.net [1] https://www.postgresql.org/message-id/flat/ac7339ca-3718-3c93-929f-99e725d1172c%40pgmasters.net
Attachment
David Steele wrote: > This patch attempts to document the limitations of the exclusive mode. Thanks! > + <para> > + The primary issue with the exclusive method is that the > + <filename>backup_label</filename> file is written into the data directory > + when <function>pg_start_backup</function> is called and remains until > + <function>pg_stop_backup</function> is called. If > + <productname>PostgreSQL</productname> or the host terminates abnormally There should be a comma at the end of this line. > + then <filename>backup_label</filename> will be left in the data directory > + and <productname>PostgreSQL</productname> will not start. A log message You should say "*may* not start", because it will if the WAL segment is still there. > + recommends that <filename>backup_label</filename> be removed if not > + restoring from a backup. > + </para> Yours, Laurenz Albe
On 2/28/19 6:08 PM, Laurenz Albe wrote: > David Steele wrote: >> This patch attempts to document the limitations of the exclusive mode. > > Thanks! > >> + <para> >> + The primary issue with the exclusive method is that the >> + <filename>backup_label</filename> file is written into the data directory >> + when <function>pg_start_backup</function> is called and remains until >> + <function>pg_stop_backup</function> is called. If >> + <productname>PostgreSQL</productname> or the host terminates abnormally > > There should be a comma at the end of this line. Fixed. >> + then <filename>backup_label</filename> will be left in the data directory >> + and <productname>PostgreSQL</productname> will not start. A log message > > You should say "*may* not start", because it will if the WAL segment is still there. You are correct. It's still pretty likely though so I went with "probably". I added some extra language to the warning that gets emitted in the log. Users are more like to see that than the documentation. I also addressed a comment from another thread by adding pg_basebackup as .e.g. rather than or. Thanks, -- -David david@pgmasters.net
Attachment
David Steele wrote: > I added some extra language to the warning that gets emitted in the log. > Users are more like to see that than the documentation. > > I also addressed a comment from another thread by adding pg_basebackup > as .e.g. rather than or. Looks good to me. Yours, Laurenz Albe
El vie., 1 de mar. de 2019 a la(s) 06:21, David Steele (david@pgmasters.net) escribió: > > > I also addressed a comment from another thread by adding pg_basebackup > as .e.g. rather than or. Thanks David, This looks very good! -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
El vie., 1 de mar. de 2019 a la(s) 06:21, David Steele (david@pgmasters.net) escribió: > > I added some extra language to the warning that gets emitted in the log. > Users are more like to see that than the documentation. > > I also addressed a comment from another thread by adding pg_basebackup > as .e.g. rather than or. More awake, I gave this last patch a second read. Wording is good now. No objections there at all. I do think that paragraph 2 and 3 should be merged as it seems the idea on the third is a continuation of what's in the second. But even without that change, I believe this patch is good for commit. Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Please follow the 1-space indentation in the documentation files. I think the style of mentioning all the problems in a note before the actual description is a bit backwards. The layout of the section should be: - This is what it does. - Here are some comparisons with other methods. - For this or that reason, it's deprecated. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/1/19 1:13 PM, Peter Eisentraut wrote: > Please follow the 1-space indentation in the documentation files. Whoops. Will fix. > I think the style of mentioning all the problems in a note before the > actual description is a bit backwards. In the case of an important note like this I think it should be right at the top where people will see it. Not everyone reads to the end. -- -David david@pgmasters.net
On Fri, Mar 1, 2019 at 2:07 PM David Steele <david@pgmasters.net> wrote:
On 3/1/19 1:13 PM, Peter Eisentraut wrote:
> Please follow the 1-space indentation in the documentation files.
Whoops. Will fix.
> I think the style of mentioning all the problems in a note before the
> actual description is a bit backwards.
In the case of an important note like this I think it should be right at
the top where people will see it. Not everyone reads to the end
Maybe have the first note say "This method is deprecated bceause it has serious risks (see bellow)" and then list the actual risks at the end?
Magnus Hagander wrote: > Maybe have the first note say "This method is deprecated bceause it has serious > risks (see bellow)" and then list the actual risks at the end? Good idea. That may attract the attention of the dogs among the readers. Yours, Laurenz Albe
On 3/1/19 3:14 PM, Laurenz Albe wrote: > Magnus Hagander wrote: >> Maybe have the first note say "This method is deprecated bceause it has serious >> risks (see bellow)" and then list the actual risks at the end? > > Good idea. That may attract the attention of the dogs among the readers. OK, here's a new version that splits the deprecation notes from the discussion of risks. I also fixed the indentation. Thanks, -- -David david@pgmasters.net
Attachment
On Thu, Mar 07, 2019 at 11:33:20AM +0200, David Steele wrote: > OK, here's a new version that splits the deprecation notes from the > discussion of risks. I also fixed the indentation. The documentation part looks fine to me. Just one nit regarding the error hint. > - errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir))); > + errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add recovery options to \"%s/postgresql.auto.conf\".\n" Here do we really want to recommend adding options to postgresql.auto.conf? This depends a lot on the solution integration so I think that this hint could actually confuse some users because it implies that they kind of *have* to do so, which is not correct. I would recommend to be a bit more generic and just use "and add necessary recovery configuration". > + "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n" > + "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.", Fine for these two ones. > + DataDir, DataDir, DataDir, DataDir))); :) -- Michael
Attachment
On Thu, Mar 7, 2019 at 5:35 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Mar 07, 2019 at 11:33:20AM +0200, David Steele wrote:
> OK, here's a new version that splits the deprecation notes from the
> discussion of risks. I also fixed the indentation.
The documentation part looks fine to me. Just one nit regarding the
error hint.
> - errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
> + errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add recovery options to \"%s/postgresql.auto.conf\".\n"
Here do we really want to recommend adding options to
postgresql.auto.conf? This depends a lot on the solution integration
so I think that this hint could actually confuse some users because it
implies that they kind of *have* to do so, which is not correct. I
would recommend to be a bit more generic and just use "and add
necessary recovery configuration".
Agreed, I think we should never tell people to "add recovery options to postgresql.auto.conf". Becuase they should never do that manually. If we want to suggest people use postgresql.auto.conf surely they should be using ALTER SYSTEM SET. Which of course doesn't work in this case, since postgrsql isn't running yet.
So yeah either that, or say "add to postgresql.conf" without the auto?
On 2019-03-07 10:33, David Steele wrote: > On 3/1/19 3:14 PM, Laurenz Albe wrote: >> Magnus Hagander wrote: >>> Maybe have the first note say "This method is deprecated bceause it has serious >>> risks (see bellow)" and then list the actual risks at the end? >> >> Good idea. That may attract the attention of the dogs among the readers. > > OK, here's a new version that splits the deprecation notes from the > discussion of risks. I also fixed the indentation. The documentation changes appear to continue the theme from the other thread that the exclusive backup mode is terrible and everyone should feel bad about it. I don't think there is consensus about that. I do welcome a more precise description of the handling of backup_label and a better hint in the error message. I think we haven't gotten to the final shape there yet, especially for the latter. I suggest to focus on that. The other changes repeat points already made in nearby documentation. I think it would be helpful to frame the documentation in a way to suggest that the nonexclusive mode is more for automation and more sophisticated tools and the exclusive mode is more for manual or simple scripted use. If we do think that the exclusive mode will be removed in PG13, then I don't think we need further documentation changes. It already says it's deprecated, and we don't need to justify that at length. But again, I'm not convinced that that will happen. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 18, 2019 at 8:33 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > The documentation changes appear to continue the theme from the other > thread that the exclusive backup mode is terrible and everyone should > feel bad about it. I don't think there is consensus about that. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greetings, * Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 2019-03-07 10:33, David Steele wrote: > > On 3/1/19 3:14 PM, Laurenz Albe wrote: > >> Magnus Hagander wrote: > >>> Maybe have the first note say "This method is deprecated bceause it has serious > >>> risks (see bellow)" and then list the actual risks at the end? > >> > >> Good idea. That may attract the attention of the dogs among the readers. > > > > OK, here's a new version that splits the deprecation notes from the > > discussion of risks. I also fixed the indentation. > > The documentation changes appear to continue the theme from the other > thread that the exclusive backup mode is terrible and everyone should > feel bad about it. I don't think there is consensus about that. I don't view it as up for much debate. The exclusive backup mode is quite bad. > I do welcome a more precise description of the handling of backup_label > and a better hint in the error message. I think we haven't gotten to > the final shape there yet, especially for the latter. I suggest to > focus on that. There isn't a way to handle the backup_label in a sane way when it's created by the server in the data directory, which is why the non-exclusive mode explicitly doesn't do that. > I think it would be helpful to frame the documentation in a way to > suggest that the nonexclusive mode is more for automation and more > sophisticated tools and the exclusive mode is more for manual or simple > scripted use. I don't agree with this at all, that's not the reason the two exist nor were they ever developed with the intent that one is for the 'simple' case and one is for the 'automated' case. Trying to wedge them into that framework strikes me as simply trying to sweep the serious issues under the rug and I don't agree with that- if we are going to continue to have this, we need to make it clear what the issues are. Sadly, we will still have users who don't actually read the docs that carefully and get bit by the exclusive backup mode because they didn't appreciate the issues, but we will continue to have that until we finally remove the exclusive mode. > If we do think that the exclusive mode will be removed in PG13, then I > don't think we need further documentation changes. It already says it's > deprecated, and we don't need to justify that at length. But again, I'm > not convinced that that will happen. There were at least a few comments made on this thread that it wasn't made clear enough in the documentation that it's deprecated. Saying that we already deprecated it doesn't change that and doesn't do anything to actually address those concerns. Continuing to carry forward these two modes makes further progress in this area difficult and unlikely to happen, and that's disappointing. Thanks! Stephen
Attachment
On Mon, Mar 18, 2019 at 11:13 PM Stephen Frost <sfrost@snowman.net> wrote: > I don't view it as up for much debate. In other words, you're not willing to listen to what other people think about this issue. I can't say I haven't noticed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Mon, Mar 18, 2019 at 11:13 PM Stephen Frost <sfrost@snowman.net> wrote: > > I don't view it as up for much debate. > > In other words, you're not willing to listen to what other people > think about this issue. I have listened, but unfortunately the discussion just revolves around "oh, it isn't actually all that bad", which, no, isn't something that's going to sway my opinion. I suppose some might view that as being principled. Thanks! Stephen
Attachment
Hi, On 2019-03-18 23:35:07 -0400, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: > > On Mon, Mar 18, 2019 at 11:13 PM Stephen Frost <sfrost@snowman.net> wrote: > > > I don't view it as up for much debate. > > > > In other words, you're not willing to listen to what other people > > think about this issue. > > I have listened, but unfortunately the discussion just revolves around > "oh, it isn't actually all that bad", which, no, isn't something that's > going to sway my opinion. I think you're right about the original issue. But at some point you just gotta settle for not everyone agreeing with you (and me). Just pushing forward hard won't achieve anything. Nobody says you need to agree with people saying "it's not all that bad", but that doesn't mean you should just try to push forward at full speed. Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-03-18 23:35:07 -0400, Stephen Frost wrote: > > * Robert Haas (robertmhaas@gmail.com) wrote: > > > On Mon, Mar 18, 2019 at 11:13 PM Stephen Frost <sfrost@snowman.net> wrote: > > > > I don't view it as up for much debate. > > > > > > In other words, you're not willing to listen to what other people > > > think about this issue. > > > > I have listened, but unfortunately the discussion just revolves around > > "oh, it isn't actually all that bad", which, no, isn't something that's > > going to sway my opinion. > > I think you're right about the original issue. But at some point you > just gotta settle for not everyone agreeing with you (and me). Just > pushing forward hard won't achieve anything. Nobody says you need to > agree with people saying "it's not all that bad", but that doesn't mean > you should just try to push forward at full speed. This thread didn't exactly pop out of nowhere.. It's a result of the prior discussion, with the goal of at least improving the situation by documenting the issues and trying to make it clearer that the exclusive mode is deprecated to hopefully provide a chance that we'll actually remove it at some point in the future as being dangerous. That's a pretty large step back from "let's rip it out for v12", and yet people are now pushing back against even that, which is definitely frustrating, and saying that I'm still pushing forward 'at full speed' definitely comes across as not really considering that this started out with "rip it out for v12" and has now regressed back to a documentation patch. Thanks! Stephen
Attachment
On 3/18/19 4:33 PM, Peter Eisentraut wrote: > On 2019-03-07 10:33, David Steele wrote: >> On 3/1/19 3:14 PM, Laurenz Albe wrote: >>> Magnus Hagander wrote: >>>> Maybe have the first note say "This method is deprecated bceause it has serious >>>> risks (see bellow)" and then list the actual risks at the end? >>> >>> Good idea. That may attract the attention of the dogs among the readers. >> >> OK, here's a new version that splits the deprecation notes from the >> discussion of risks. I also fixed the indentation. > > The documentation changes appear to continue the theme from the other > thread that the exclusive backup mode is terrible and everyone should > feel bad about it. I don't think there is consensus about that. I wouldn't characterize documenting the limitations of the method as making people feel bad about it. If you feel my language implies that then please let me know where you see it. > I do welcome a more precise description of the handling of backup_label > and a better hint in the error message. I think we haven't gotten to > the final shape there yet, especially for the latter. I suggest to > focus on that. I was planning to update the error message hint as Magnus and Michael have suggested. I'm not a fan of normalizing the documentation around backup_label, i.e. making it seem like a perfectly normal thing that you need to manually delete a file to get the cluster to start. This may still lead users to script their way around the problem, with possible corruption as the result. > The other changes repeat points already made in nearby documentation. Granted, but in this sense they are meant to concisely describe why the feature is deprecated, rather than being instructions in a user guide. > I think it would be helpful to frame the documentation in a way to > suggest that the nonexclusive mode is more for automation and more > sophisticated tools and the exclusive mode is more for manual or simple > scripted use. I don't think the features were developed with this in mind and I wouldn't want to characterize them this way now. Non-exclusive mode was developed to address the shortcomings of exclusive mode, not as an "automate-able" version of it. Describing any backup method as primarily "manual" in nature seems counter-intuitive to me. > If we do think that the exclusive mode will be removed in PG13, then I > don't think we need further documentation changes. It already says it's > deprecated, and we don't need to justify that at length. But again, I'm > not convinced that that will happen. I think we should remove it entirely in PG13, but I'm not sure if there is enough support. I'll propose it again in the first CF and see where it goes. This patch was intended to be a compromise based on discussion in the thread about removing the feature. If this is now a bridge too far then I'm at a bit of a loss as to how to proceed. If we water it down and normalize it then we are not achieving the goals of the patch as I see them -- to steer users away from this method when possible and to make it less of a shock if it goes away. Regards, -- -David david@pgmasters.net
On 3/8/19 6:08 AM, Magnus Hagander wrote: > On Thu, Mar 7, 2019 at 5:35 PM Michael Paquier <michael@paquier.xyz > <mailto:michael@paquier.xyz>> wrote: > > On Thu, Mar 07, 2019 at 11:33:20AM +0200, David Steele wrote: > > OK, here's a new version that splits the deprecation notes from the > > discussion of risks. I also fixed the indentation. > > The documentation part looks fine to me. Just one nit regarding the > error hint. > > > - errhint("If you are not restoring from a backup, try > removing the file \"%s/backup_label\".", DataDir))); > > + errhint("If you are restoring from a backup, touch > \"%s/recovery.signal\" and add recovery options to > \"%s/postgresql.auto.conf\".\n" > > Here do we really want to recommend adding options to > postgresql.auto.conf? This depends a lot on the solution integration > so I think that this hint could actually confuse some users because it > implies that they kind of *have* to do so, which is not correct. I > would recommend to be a bit more generic and just use "and add > necessary recovery configuration". > > > Agreed, I think we should never tell people to "add recovery options to > postgresql.auto.conf". Becuase they should never do that manually. If we > want to suggest people use postgresql.auto.conf surely they should be > using ALTER SYSTEM SET. Which of course doesn't work in this case, since > postgrsql isn't running yet. > > So yeah either that, or say "add to postgresql.conf" without the auto? I went with Michael's suggestion. Attached is a new patch. I also think we should set a flag and throw the error below this if/else block. This is a rather large message and maintaining two copies of it is not ideal. Please note that there have been objections to the patch later in this thread by Peter and Robert. I'm not very interested in watering down the documentation changes as Peter suggests, but I think at the very least we should commit the added hints in the error message. For many users this error will be their first point of contact with the backup_label issue/behavior. Regards, -- -David david@pgmasters.net
Attachment
On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote: > Please note that there have been objections to the patch later in this > thread by Peter and Robert. I'm not very interested in watering down the > documentation changes as Peter suggests, but I think at the very least we > should commit the added hints in the error message. For many users this > error will be their first point of contact with the backup_label > issue/behavior. The updates of the log message do not imply anything negative as I read them as they mention to not remove the backup_label file. So while we don't have an agreement about the docs, the log messages may be able to be committed? Peter? Robert? "will result in a corruptED cluster" is more correct? -- Michael
Attachment
On Mon, Mar 18, 2019 at 1:33 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-07 10:33, David Steele wrote:
> On 3/1/19 3:14 PM, Laurenz Albe wrote:
I think it would be helpful to frame the documentation in a way to
suggest that the nonexclusive mode is more for automation and more
sophisticated tools and the exclusive mode is more for manual or simple
scripted use.
But that would be factually incorrect and backwards, so it seems like a terrible idea, at least when it comes to manual. If you are doing it manually, it's a lot *easier* to do it right with the non-exclusive mode, because you can easily keep one psql and one shell open. And that's safe.
The only real use case that has been put forward for the exclusive backup mode is when the backups are done through a script, and that script is limited to only use something like bash (and can't use a scripting language like perl or python or powershell or other more advanced scripting languages).
And I don't think exclusive mode should be suggested for "simple scripts" either, since it's anything but -- scripts using the exclusive mode correctly will be anything but simple. A better term there would be to single out shellscripts, I'd suggest, if we want to single something out. Or more generic, for "scripting languages incapable of keeping a connection open across multiple lines" or something?
We can certainly keep it, but let's not tell people something is simple when it's not.
If we do think that the exclusive mode will be removed in PG13, then I
don't think we need further documentation changes. It already says it's
deprecated, and we don't need to justify that at length. But again, I'm
not convinced that that will happen.
But the complaints before was that the deprecation currently in the documentation was not enough to remove it....
On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote: > > Please note that there have been objections to the patch later in this > > thread by Peter and Robert. I'm not very interested in watering down the > > documentation changes as Peter suggests, but I think at the very least we > > should commit the added hints in the error message. For many users this > > error will be their first point of contact with the backup_label > > issue/behavior. > > The updates of the log message do not imply anything negative as I > read them as they mention to not remove the backup_label file. So > while we don't have an agreement about the docs, the log messages may > be able to be committed? Peter? Robert? > > "will result in a corruptED cluster" is more correct? I really like the proposed changes to the ereport() text. I think the "Be careful" hint is a really helpful way of phrasing it. I think "corrupt" as the patch has it is slightly better than "corrupted". Obviously, we have to make the updates for recovery.signal no matter what, and you could argue that part should be its own commit, but I like all of the changes. I'm not too impressed with the documentation changes. A lot of the information being added is already present somewhere in that very same section. It's reasonable to revise the section so that the dangers stand out more clearly, but it doesn't seem good to revise it in a way that ends up duplicating the existing information. Here's my suggestion -- ditch the note at the end of the section and make the one at the beginning read like this: The exclusive backup method is deprecated and should be avoided. Prior to <productname>PostgreSQL</productname> 9.6, this was the only low-level method available, but it is now recommended that all users upgrade their scripts to use non-exclusive backups. Then, revise the first paragraph like this: The process for an exclusive backup is mostly the same as for a non-exclusive one, but it differs in a few key steps. This type of backup can only be taken on a primary and does not allow concurrent backups. Moreover, because it writes a backup_label file on the master, it can cause the master to fail to restart automatically after a crash. On the other hand, the erroneous removal of a backup_label file from a backup or standby is a common mistake which can can result in serious data corruption. If it is necessary to use this method, the following steps may be used. Later, where it says: Note that if the server crashes during the backup it may not be possible to restart until the <literal>backup_label</literal> file has been manually deleted from the <envar>PGDATA</envar> directory. Change it to read: As noted above, if the server crashes during the backup it may not be possible to restart until the <literal>backup_label</literal> file has been manually deleted from the <envar>PGDATA</envar> directory. Note that it is very important to never to remove the <literal>backup_label</literal> file when restoring a backup, because this will result in corruption. Confusion about the circumstances under which it is appropriate to remove this file is a common cause of data corruption when using this method; be very certain that you remove the file only on an existing master and never when building a standby or restoring a backup, even if you are building a standby that will subsequently be promoted to a new master. I also think we should revise this thoroughly terrible advice: If you wish to place a time limit on the execution of <function>pg_stop_backup</function>, set an appropriate <varname>statement_timeout</varname> value, but make note that if <function>pg_stop_backup</function> terminates because of this your backup may not be valid. That seems awful, not only because it encourages people to do that particular thing and end up leaving the server in backup mode, but also because it doesn't clearly articulate the extreme importance of making sure that the server is not left in backup mode. So I would propose that we strike that text entirely and replace it with something like: When using exclusive backup mode, it is absolutely imperative to make sure that <function>pg_stop_backup</function> completes successfully at the end of the backup. Even if the backup itself fails, for example due to lack of disk space, failure to call <function>pg_stop_backup</function> will leave the server in backup mode indefinitely, causing future backups to fail and increasing the risk of a restart during a time when <literal>backup_label</literal> exists. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Robert, On 3/20/19 6:31 PM, Robert Haas wrote: > On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier <michael@paquier.xyz> wrote: >> On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote: >>> Please note that there have been objections to the patch later in this >>> thread by Peter and Robert. I'm not very interested in watering down the >>> documentation changes as Peter suggests, but I think at the very least we >>> should commit the added hints in the error message. For many users this >>> error will be their first point of contact with the backup_label >>> issue/behavior. >> >> The updates of the log message do not imply anything negative as I >> read them as they mention to not remove the backup_label file. So >> while we don't have an agreement about the docs, the log messages may >> be able to be committed? Peter? Robert? >> >> "will result in a corruptED cluster" is more correct? > > I really like the proposed changes to the ereport() text. I think the > "Be careful" hint is a really helpful way of phrasing it. I think > "corrupt" as the patch has it is slightly better than "corrupted". > Obviously, we have to make the updates for recovery.signal no matter > what, and you could argue that part should be its own commit, but I > like all of the changes. Cool. > I'm not too impressed with the documentation changes. A lot of the > information being added is already present somewhere in that very same > section. It's reasonable to revise the section so that the dangers > stand out more clearly, but it doesn't seem good to revise it in a way > that ends up duplicating the existing information. OK. > Here's my > suggestion -- ditch the note at the end of the section and make the > one at the beginning read like this: > > The exclusive backup method is deprecated and should be avoided. > Prior to <productname>PostgreSQL</productname> 9.6, this was the only > low-level method available, but it is now recommended that all users > upgrade their scripts to use non-exclusive backups. > > Then, revise the first paragraph like this: > > The process for an exclusive backup is mostly the same as for a > non-exclusive one, but it differs in a few key steps. This type of > backup can only be taken on a primary and does not allow concurrent > backups. Moreover, because it writes a backup_label file on the > master, it can cause the master to fail to restart automatically after > a crash. On the other hand, the erroneous removal of a backup_label > file from a backup or standby is a common mistake which can can result > in serious data corruption. If it is necessary to use this method, > the following steps may be used. This works for me as I feel like the cautions here (and below) are still strongly worded. Peter? > Later, where it says: > > Note that if the server crashes during the backup it may not be > possible to restart until the <literal>backup_label</literal> > file has been > manually deleted from the <envar>PGDATA</envar> directory. > > Change it to read: > > As noted above, if the server crashes during the backup it may not be > possible to restart until the <literal>backup_label</literal> file has > been manually deleted from the <envar>PGDATA</envar> directory. Note > that it is very important to never to remove the > <literal>backup_label</literal> file when restoring a backup, because > this will result in corruption. Confusion about the circumstances > under which it is appropriate to remove this file is a common cause of > data corruption when using this method; be very certain that you > remove the file only on an existing master and never when building a > standby or restoring a backup, even if you are building a standby that > will subsequently be promoted to a new master. Technically we are into repetition here, but I'm certainly OK with it as this point bears repeating. > I also think we should revise this thoroughly terrible advice: > > If you wish to place a time limit on the execution of > <function>pg_stop_backup</function>, set an appropriate > <varname>statement_timeout</varname> value, but make note that if > <function>pg_stop_backup</function> terminates because of this your backup > may not be valid. > > That seems awful, not only because it encourages people to do that > particular thing and end up leaving the server in backup mode, but > also because it doesn't clearly articulate the extreme importance of > making sure that the server is not left in backup mode. So I would > propose that we strike that text entirely and replace it with > something like: > > When using exclusive backup mode, it is absolutely imperative to make > sure that <function>pg_stop_backup</function> completes successfully > at the end of the backup. Even if the backup itself fails, for > example due to lack of disk space, failure to call > <function>pg_stop_backup</function> will leave the server in backup > mode indefinitely, causing future backups to fail and increasing the > risk of a restart during a time when <literal>backup_label</literal> > exists. It's also pretty important that exclusive backups complete successfully since backup_label is returned from pg_stop_backup() -- the backup will definitely be corrupt without that if there was a checkpoint during the backup. But, yeah, leaving a backup_label around for a long time increases the restart risks a lot. I'll revise the patch if Peter thinks this approach looks reasonable. Thanks, -- -David david@pgmasters.net
Hi Robert, On 3/20/19 5:08 PM, David Steele wrote: > > I'll revise the patch if Peter thinks this approach looks reasonable. Hopefully Peter's silence can be interpreted as consent. Probably just busy, though. I used your suggestions with minor editing. After some reflection, I agree that the inline warnings are likely to be more effective than something at the end, at least for those working on a new implementation. Regards, -- -David david@pgmasters.net
Attachment
On 2019-03-20 14:42, Magnus Hagander wrote: > But that would be factually incorrect and backwards, so it seems like a > terrible idea, at least when it comes to manual. If you are doing it > manually, it's a lot *easier* to do it right with the non-exclusive > mode, because you can easily keep one psql and one shell open. And > that's safe. The scenario I have in mind is, a poorly maintained server, nothing installed, can't install anything (no internet connection, license expired), flaky network, you fear it's going to fail soon, you need to take a backup. The simplest procedure would appear to be: start backup mode, copy files away, stop backup mode. Anything else that involves holding a session open over there for the whole time is way more fragile unless proper preparations have been made (and even then). So I don't know what you want to call that scenario, but I would feel more comfortable having these basic tools available in a bind. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Mar 29, 2019 at 6:27 AM David Steele <david@pgmasters.net> wrote: > I used your suggestions with minor editing. After some reflection, I > agree that the inline warnings are likely to be more effective than > something at the end, at least for those working on a new implementation. I'm glad we could agree on something. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 29, 2019 at 1:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 29, 2019 at 6:27 AM David Steele <david@pgmasters.net> wrote:
> I used your suggestions with minor editing. After some reflection, I
> agree that the inline warnings are likely to be more effective than
> something at the end, at least for those working on a new implementation.
I'm glad we could agree on something. Committed.
+1, thanks.
Minor nitpick:
+ backup can only be taken on a primary and does not allow concurrent
+ backups. Moreover, because it writes a backup_label file on the
+ master, it can cause the master to fail to restart automatically after
Let's be consistent in if we call it a primary or a master, at least within the same paragraph :)
On 2019-03-29 11:27, David Steele wrote: >> I'll revise the patch if Peter thinks this approach looks reasonable. > > Hopefully Peter's silence can be interpreted as consent. Probably just > busy, though. > > I used your suggestions with minor editing. After some reflection, I > agree that the inline warnings are likely to be more effective than > something at the end, at least for those working on a new implementation. It looks very sensible now, I think. Thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/29/19 11:58 AM, Peter Eisentraut wrote: > On 2019-03-20 14:42, Magnus Hagander wrote: >> But that would be factually incorrect and backwards, so it seems like a >> terrible idea, at least when it comes to manual. If you are doing it >> manually, it's a lot *easier* to do it right with the non-exclusive >> mode, because you can easily keep one psql and one shell open. And >> that's safe. > > The scenario I have in mind is, a poorly maintained server, nothing > installed, can't install anything (no internet connection, license > expired), flaky network, you fear it's going to fail soon, you need to > take a backup. The simplest procedure would appear to be: start backup > mode, copy files away, stop backup mode. Anything else that involves > holding a session open over there for the whole time is way more fragile > unless proper preparations have been made (and even then). So I don't > know what you want to call that scenario, but I would feel more > comfortable having these basic tools available in a bind. I would argue the best thing in this scenario is to use pg_basebackup. It's a solid tool and likely far better than any script the user might cook up on the spot. Regards, -- -David david@pgmasters.net
On 3/29/19 12:25 PM, Magnus Hagander wrote: > On Fri, Mar 29, 2019 at 1:19 PM Robert Haas <robertmhaas@gmail.com > <mailto:robertmhaas@gmail.com>> wrote: > > On Fri, Mar 29, 2019 at 6:27 AM David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > I used your suggestions with minor editing. After some reflection, I > > agree that the inline warnings are likely to be more effective than > > something at the end, at least for those working on a new > implementation. > > I'm glad we could agree on something. Committed. Me, too. Thanks! > Minor nitpick: > + backup can only be taken on a primary and does not allow concurrent > + backups. Moreover, because it writes a backup_label file on the > + master, it can cause the master to fail to restart automatically after > > Let's be consistent in if we call it a primary or a master, at least > within the same paragraph :) Agreed, let's stick with "primary". Are we planning to back-patch this? The deprecation was added to the docs in 9.6 -- I think these clarifications would be helpful. Thanks, -- -David david@pgmasters.net
On Fri, Mar 29, 2019 at 8:45 AM David Steele <david@pgmasters.net> wrote: > Are we planning to back-patch this? The deprecation was added to the > docs in 9.6 -- I think these clarifications would be helpful. I wasn't planning too, but I guess we could consider it. I'd be more inclined to back-patch the documentation changes than the message text changes, but what do other people think? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/29/19 12:46 PM, Robert Haas wrote: > On Fri, Mar 29, 2019 at 8:45 AM David Steele <david@pgmasters.net> wrote: >> Are we planning to back-patch this? The deprecation was added to the >> docs in 9.6 -- I think these clarifications would be helpful. > > I wasn't planning too, but I guess we could consider it. I'd be more > inclined to back-patch the documentation changes than the message text > changes, but what do other people think? I think we should definitely do the docs, I'm 50% on the message. You could argue that is is a behavioral change, but it is pretty important info. Regards, -- -David david@pgmasters.net
On Fri, Mar 29, 2019 at 1:49 PM David Steele <david@pgmasters.net> wrote:
On 3/29/19 12:46 PM, Robert Haas wrote:
> On Fri, Mar 29, 2019 at 8:45 AM David Steele <david@pgmasters.net> wrote:
>> Are we planning to back-patch this? The deprecation was added to the
>> docs in 9.6 -- I think these clarifications would be helpful.
>
> I wasn't planning too, but I guess we could consider it. I'd be more
> inclined to back-patch the documentation changes than the message text
> changes, but what do other people think?
I think we should definitely do the docs, I'm 50% on the message. You
could argue that is is a behavioral change, but it is pretty important info.
+1 on the docs.
Is the changes to the messages going to cause issues or weirdness for translators? That would be a reason not to backpatch it. Without that, I'm leaning towards backpatching it.
On 2019-03-29 13:54, Magnus Hagander wrote: > Is the changes to the messages going to cause issues or weirdness for > translators? That would be a reason not to backpatch it. Without that, > I'm leaning towards backpatching it. Note that the messages refer to recovery.signal, so a backpatch would have to rephrase the message. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services