Re: Remove Deprecated Exclusive Backup Mode - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Remove Deprecated Exclusive Backup Mode |
Date | |
Msg-id | CAHGQGwHLuG0dc3TRkzJdVhJKUkR9=K3UMBBHvdNtvht=P+z7Eg@mail.gmail.com Whole thread Raw |
In response to | Re: Remove Deprecated Exclusive Backup Mode (David Steele <david@pgmasters.net>) |
Responses |
Re: Remove Deprecated Exclusive Backup Mode
Re: Remove Deprecated Exclusive Backup Mode |
List | pgsql-hackers |
On Tue, Feb 26, 2019 at 3:17 AM David Steele <david@pgmasters.net> wrote: > > On 2/25/19 7:50 PM, Fujii Masao wrote: > > On Mon, Feb 25, 2019 at 10:49 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > >> > >> I'm not playing devil's advocate here to annoy you. I see the problems > >> with the exclusive backup, and I see how it can hurt people. > >> I just think that removing exclusive backup without some kind of help > >> like Andres sketched above will make people unhappy. > > > > +1 > > > > Another idea is to improve an exclusive backup method so that it will never > > cause such issue. What about changing an exclusive backup mode of > > pg_start_backup() so that it creates something like backup_label.pending file > > instead of backup_label? Then if the database cluster has backup_label.pending > > file but not recovery.signal (this is the case where the database is recovered > > just after the server crashes while an exclusive backup is in progress), > > in this idea, the recovery using that database cluster always ignores > > (or removes) backup_label.pending file and start replaying WAL from > > the REDO location that pg_control file indicates. So this idea enables us to > > work around the issue that an exclusive backup could cause. > > It's an interesting idea. > > > On the other hand, the downside of this idea is that the users need to change > > the recovery procedure. When they want to do PITR using the backup having > > backup_label.pending, they need to not only create recovery.signal but also > > rename backup_label.pending to backup_label. Rename of backup_label file > > is brand-new step for their recovery procedure, and changing the recovery > > procedure might be painful for some users. But IMO it's less painful than > > removing an exclusive backup API at all. > > Well, given that we have invalidated all prior recovery procedures in > PG12 I'm not sure how big a deal that is. Of course, it's too late make > a change like this for PG12. > > > Thought? > > Here's the really obvious bad thing: if users do not update their > procedures and we ignore backup_label.pending on startup then they will > end up with a corrupt database because it will not replay from the > correct checkpoint. If we error on the presence of backup_label.pending > then we are right back to where we started. No. In this case, since backup_label.pending and recovery.signal exist, as I described in my previous post, the server stops the recovery with PANIC error before corrupting the database. Then the operator can rename backup_label.pending to backup_label and restart the recovery safely. So, let me clarify the situations; (1) If backup_label and recovery.signal exist, the recovery starts safely. This is the normal case of recovery from the base backup. (2)If backup_label.pending and recovery.signal exist, as described above, PANIC error happens at the start of recovery. This case can happen if the operator forgets to rename backup_label.pending, i.e., operation mistake. So, after PANIC, the operator needs to fix her or his mistake (i.e., rename backup_label.pending) and restart the recovery. (3) If backup_label.pending exists but recovery.signal doesn't, the server ignores (or removes) backup_label.pending and do the recovery starting the pg_control's REDO location. This case can happen if the server crashes while an exclusive backup is in progress. So crash-in-the-middle-of-backup doesn't prevent the recovery from starting in this idea. Regards, -- Fujii Masao
pgsql-hackers by date: