Re: Add exclusive backup deprecation notes to documentation - Mailing list pgsql-hackers

From David Steele
Subject Re: Add exclusive backup deprecation notes to documentation
Date
Msg-id 3e4899a6-131f-2e93-ca22-4e174bdc1963@pgmasters.net
Whole thread Raw
In response to Re: Add exclusive backup deprecation notes to documentation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Add exclusive backup deprecation notes to documentation
List pgsql-hackers
Hi Robert,

On 3/20/19 6:31 PM, Robert Haas wrote:
> On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier <michael@paquier.xyz> wrote:
>> On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:
>>> Please note that there have been objections to the patch later in this
>>> thread by Peter and Robert.  I'm not very interested in watering down the
>>> documentation changes as Peter suggests, but I think at the very least we
>>> should commit the added hints in the error message.  For many users this
>>> error will be their first point of contact with the backup_label
>>> issue/behavior.
>>
>> The updates of the log message do not imply anything negative as I
>> read them as they mention to not remove the backup_label file.  So
>> while we don't have an agreement about the docs, the log messages may
>> be able to be committed?  Peter?  Robert?
>>
>> "will result in a corruptED cluster" is more correct?
> 
> I really like the proposed changes to the ereport() text.  I think the
> "Be careful" hint is a really helpful way of phrasing it.  I think
> "corrupt" as the patch has it is slightly better than "corrupted".
> Obviously, we have to make the updates for recovery.signal no matter
> what, and you could argue that part should be its own commit, but I
> like all of the changes.

Cool.

> I'm not too impressed with the documentation changes.  A lot of the
> information being added is already present somewhere in that very same
> section.  It's reasonable to revise the section so that the dangers
> stand out more clearly, but it doesn't seem good to revise it in a way
> that ends up duplicating the existing information.  

OK.

> Here's my
> suggestion -- ditch the note at the end of the section and make the
> one at the beginning read like this:
> 
> The exclusive backup method is deprecated and should be avoided.
> Prior to <productname>PostgreSQL</productname> 9.6, this was the only
> low-level method available, but it is now recommended that all users
> upgrade their scripts to use non-exclusive backups.
> 
> Then, revise the first paragraph like this:
> 
> The process for an exclusive backup is mostly the same as for a
> non-exclusive one, but it differs in a few key steps. This type of
> backup can only be taken on a primary and does not allow concurrent
> backups.  Moreover, because it writes a backup_label file on the
> master, it can cause the master to fail to restart automatically after
> a crash.  On the other hand, the erroneous removal of a backup_label
> file from a backup or standby is a common mistake which can can result
> in serious data corruption.  If it is necessary to use this method,
> the following steps may be used.

This works for me as I feel like the cautions here (and below) are still 
strongly worded.  Peter?

> Later, where it says:
> 
>         Note that if the server crashes during the backup it may not be
>         possible to restart until the <literal>backup_label</literal>
> file has been
>         manually deleted from the <envar>PGDATA</envar> directory.
> 
> Change it to read:
> 
> As noted above, if the server crashes during the backup it may not be
> possible to restart until the <literal>backup_label</literal> file has
> been manually deleted from the <envar>PGDATA</envar> directory.  Note
> that it is very important to never to remove the
> <literal>backup_label</literal> file when restoring a backup, because
> this will result in corruption.  Confusion about the circumstances
> under which it is appropriate to remove this file is a common cause of
> data corruption when using this method; be very certain that you
> remove the file only on an existing master and never when building a
> standby or restoring a backup, even if you are building a standby that
> will subsequently be promoted to a new master.

Technically we are into repetition here, but I'm certainly OK with it as 
this point bears repeating.

> I also think we should revise this thoroughly terrible advice:
> 
>        If you wish to place a time limit on the execution of
>        <function>pg_stop_backup</function>, set an appropriate
>        <varname>statement_timeout</varname> value, but make note that if
>        <function>pg_stop_backup</function> terminates because of this your backup
>        may not be valid.
> 
> That seems awful, not only because it encourages people to do that
> particular thing and end up leaving the server in backup mode, but
> also because it doesn't clearly articulate the extreme importance of
> making sure that the server is not left in backup mode.  So I would
> propose that we strike that text entirely and replace it with
> something like:
> 
> When using exclusive backup mode, it is absolutely imperative to make
> sure that <function>pg_stop_backup</function> completes successfully
> at the end of the backup.  Even if the backup itself fails, for
> example due to lack of disk space, failure to call
> <function>pg_stop_backup</function> will leave the server in backup
> mode indefinitely, causing future backups to fail and increasing the
> risk of a restart during a time when <literal>backup_label</literal>
> exists.

It's also pretty important that exclusive backups complete successfully 
since backup_label is returned from pg_stop_backup() -- the backup will 
definitely be corrupt without that if there was a checkpoint during the 
backup.  But, yeah, leaving a backup_label around for a long time 
increases the restart risks a lot.

I'll revise the patch if Peter thinks this approach looks reasonable.

Thanks,
-- 
-David
david@pgmasters.net


pgsql-hackers by date:

Previous
From: Alexander Kuzmenkov
Date:
Subject: Re: Optimze usage of immutable functions as relation
Next
From: Euler Taveira
Date:
Subject: Re: PostgreSQL pollutes the file system