Re: Remove Deprecated Exclusive Backup Mode - Mailing list pgsql-hackers

From David Steele
Subject Re: Remove Deprecated Exclusive Backup Mode
Date
Msg-id f62320a3-6e2a-79ef-c33d-97d9429ac1a0@pgmasters.net
Whole thread Raw
In response to Re: Remove Deprecated Exclusive Backup Mode  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Remove Deprecated Exclusive Backup Mode
Re: Remove Deprecated Exclusive Backup Mode
Re: Remove Deprecated Exclusive Backup Mode
List pgsql-hackers
On 2/25/19 7:50 PM, Fujii Masao wrote:
> On Mon, Feb 25, 2019 at 10:49 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>>
>> I'm not playing devil's advocate here to annoy you.  I see the problems
>> with the exclusive backup, and I see how it can hurt people.
>> I just think that removing exclusive backup without some kind of help
>> like Andres sketched above will make people unhappy.
> 
> +1
> 
> Another idea is to improve an exclusive backup method so that it will never
> cause such issue. What about changing an exclusive backup mode of
> pg_start_backup() so that it creates something like backup_label.pending file
> instead of backup_label? Then if the database cluster has backup_label.pending
> file but not recovery.signal (this is the case where the database is recovered
> just after the server crashes while an exclusive backup is in progress),
> in this idea, the recovery using that database cluster always ignores
> (or removes) backup_label.pending file and start replaying WAL from
> the REDO location that pg_control file indicates. So this idea enables us to
> work around the issue that an exclusive backup could cause.

It's an interesting idea.

> On the other hand, the downside of this idea is that the users need to change
> the recovery procedure. When they want to do PITR using the backup having
> backup_label.pending, they need to not only create recovery.signal but also
> rename backup_label.pending to backup_label. Rename of backup_label file
> is brand-new step for their recovery procedure, and changing the recovery
> procedure might be painful for some users. But IMO it's less painful than
> removing an exclusive backup API at all.

Well, given that we have invalidated all prior recovery procedures in 
PG12 I'm not sure how big a deal that is.  Of course, it's too late make 
a change like this for PG12.

> Thought?

Here's the really obvious bad thing: if users do not update their 
procedures and we ignore backup_label.pending on startup then they will 
end up with a corrupt database because it will not replay from the 
correct checkpoint.  If we error on the presence of backup_label.pending 
then we are right back to where we started.

I know there are backup solutions that rely on copying all required WAL 
to pg_xlog/pg_wal before starting recovery.  Those solutions would 
silently break in this case and end up in corruption.  If we require 
recovery.signal then we still have the current problem of the cluster 
not starting after a crash.

> BTW, if recovery.signal is created but backup_label.pending is not renamed
> (this is the case where the operator forgets to rename the file even though
> she or he create recovery signal file, i.e., mis-configuration), I think that
> the recovery should emit PANIC immediately with the HINT like
> "HINT: rename backup_label.pening to backup_label if you want to do PITR".

This causes its own problems, as stated above.

Regards,
-- 
-David
david@pgmasters.net


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: POC: converting Lists into arrays
Next
From: Bruce Momjian
Date:
Subject: Re: Remove Deprecated Exclusive Backup Mode