Re: backup_label during crash recovery: do we know how to solve it? - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: backup_label during crash recovery: do we know how to solve it?
Date
Msg-id CAAZKuFYwaKANso3uy1ERw1wgo2hvM8RNPYpLAvjF1bXgCt1MBg@mail.gmail.com
Whole thread Raw
In response to Re: backup_label during crash recovery: do we know how to solve it?  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: backup_label during crash recovery: do we know how to solve it?
List pgsql-hackers
On Sat, Dec 3, 2011 at 8:04 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> At the moment, if the situation is ambiguous, the system assumes that you're
> restoring from a backup. What your suggestion amounts to is to reverse tht
> assumption, and assume instead that you're doing crash recovery on a system
> where a backup was being taken. In that case, if you take a backup with
> pg_base_backup(), and fail to archive the WAL files correctly, or forget to
> create a recovery.conf file, the database will happily start up from the
> backup, but is in fact corrupt. That is not good either.

Sorry for my lengthy time before getting around to writing a response,
but I do think there is, in practice, a way around this conundrum,
whose fundamental goal is to make sure that the backup is not, in
actuality, a full binary copy of the database.

A workaround that has a much smaller restart-hole is to move the backup_label in
and out of the database directory after having copied it to the
archive and before calling stop_backup.

How about this revised protocol (names and adjustments welcome), to
enable a less-terrible approach?  Not only is that workaround
incorrect (it has a small window where the system will not be able to
restart), but it's pretty inconvenient.

New concepts:

pg_prepare_backup: readies postgres for backing up.  Saves the
backup_label content in volatile memory.  The next start_backup will
write that volatile information to disk, and the information within
can be used to compute a "backup-key"

"backup-key": a subset of the backup label, all it needs (as far as I
know) might be the database-id and then the WAL position (timeline,
seg, offset) the backup is starting at.

Protocol:

1. select pg_prepare_backup();
(Backup process remembers that backup-key is in progress (say, writes
it to /backup-keys/%k)
2. select pg_start_backup();
(perform copying)
3. select pg_stop_backup();
4. backup process can optionally clear its state remembering the
backup-key (rm /backup-keys/%k)

A crash at each point would be resolved this way:

Before step 1: Nothing has happened, so normal crash recovery.

Before step 2: (same, as it doesn't involve a state transition in postgres)

Before step 3: when the crash occurs and postgres starts up, postgres
asks the external software if a backup was in progress, say via a
"backup-in-progress command".  It is responsible for looking at
/backup-keys/%k and saying "yes, it was". The database can then do
normal crash recovery.  The backup can even be continuing through this
time, I think.

Before step 4: The archiver may leak the backup-key.  Because
backup-keys using the information I defined earlier have an ordering,
it should be possible to reap these if necessary at intervals.

Fundamentally, the way this approach gets around the 'physical copy'
conundrum is asking the archiver software to remember something well
out of the way of the database directory on the system that is being
backed up.

The main usability gain is that there will be a standardized way to
have postgres check to see if it was doing a backup (and thus should
use normal crash recovery) regardless of how it's started, rather than
hacks around, say, upstart scripts on ubuntu, or pg_ctl that are
idiosyncratic to what is a common need.

What do you think?  I think this may even be backwards compatible,
because if one doesn't call pg_prepare_backup then one can fall back
to that upon calling pg_start_backup.  The "backup in progress"
command is additive, and doesn't change anything for systems that do
not have it defined.

--
fdr


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: failed regress test
Next
From: Peter Geoghegan
Date:
Subject: Progress on fast path sorting, btree index creation time