Thread: backup_label in a crash recovery

backup_label in a crash recovery

From
Fujii Masao
Date:
Hi,

When a crash occurs before calling pg_stop_backup(),
the subsequent crash recovery causes the FATAL error
and outputs the following HINT message.
   If you are not restoring from a backup, try removing the file
\"%s/backup_label\"."

I wonder why backup_label isn't automatically removed
in normal crash recovery case. Is this for the fail-safe
protection; prevent admin from restoring from a backup
wrongly without creating recovery.conf? Or another?

If that's intentional, a clusterware for shared disk
failover system should remove backup_label whenever
doing failover. Otherwise, when a crash occurs during
online-backup, the failover would fail.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: backup_label in a crash recovery

From
"Albe Laurenz"
Date:
Fujii Masao wrote:
> When a crash occurs before calling pg_stop_backup(),
> the subsequent crash recovery causes the FATAL error
> and outputs the following HINT message.
>
>     If you are not restoring from a backup, try removing the file
> \"%s/backup_label\"."
>
> I wonder why backup_label isn't automatically removed
> in normal crash recovery case. Is this for the fail-safe
> protection; prevent admin from restoring from a backup
> wrongly without creating recovery.conf? Or another?
>
> If that's intentional, a clusterware for shared disk
> failover system should remove backup_label whenever
> doing failover. Otherwise, when a crash occurs during
> online-backup, the failover would fail.

I do not know if there is a good reason why the server does
not ignore backup_label if recovery.conf is not present.

But as it is, any failover system should definitely remove
backup_label.

Yours,
Laurenz Albe


Re: backup_label in a crash recovery

From
Tom Lane
Date:
Fujii Masao <masao.fujii@gmail.com> writes:
> I wonder why backup_label isn't automatically removed
> in normal crash recovery case.

Removing it automatically could be catastrophic if done incorrectly, no?

> If that's intentional, a clusterware for shared disk
> failover system should remove backup_label whenever
> doing failover.

It would be no less catastrophic if done incorrectly from outside the
postmaster; see for example the problems people have had historically
with startup scripts that think they should remove postmaster.pid.
        regards, tom lane


Re: backup_label in a crash recovery

From
"Albe Laurenz"
Date:
Tom Lane wrote:
> > I wonder why backup_label isn't automatically removed
> > in normal crash recovery case.
>
> Removing it automatically could be catastrophic if done
> incorrectly, no?
>
> It would be no less catastrophic if done incorrectly from outside the
> postmaster; see for example the problems people have had historically
> with startup scripts that think they should remove postmaster.pid.

I beg to differ.

Removing postmaster.pid can lead to a corrupt database.
Removing backup_label means that one of your backups will go wrong,
and a subsequent pg_stop_backup() will throw an error.

If you have a cluster failover during an online backup, I think
any reasonable person would suspect that the backup went wrong.
And if nothing else does, the error on pg_stop_backup() will tell you.

Given a choice, I expect that everybody who is intent enough
on availibility to implement such a solution will want the
database to come up if it can be done without data loss.

Is there a flaw in my reasoning?

Yours,
Laurenz Albe


Re: backup_label in a crash recovery

From
Andrew Gierth
Date:
>>>>> "Albe" == "Albe Laurenz" <laurenz.albe@wien.gv.at> writes:
Albe> Removing postmaster.pid can lead to a corrupt database.Albe> Removing backup_label means that one of your backups
willgoAlbe> wrong, and a subsequent pg_stop_backup() will throw an error.
 
Albe> If you have a cluster failover during an online backup, I thinkAlbe> any reasonable person would suspect that the
backupwent wrong.Albe> And if nothing else does, the error on pg_stop_backup() willAlbe> tell you.[...]Albe> Is there a
flawin my reasoning?
 

Yes.

Imagine the following scenario: the system crashed while pg_start_backup
was in effect (so backup_label exists), and the postmaster is about to
start up. i.e. you're at the point where you might naively imagine that
you can delete the backup_label.

How do you distinguish between these two scenarios:

1) you're starting up in a data dir where you crashed in the middle of  a backup

2) you're starting up in a data dir that is a restore of a base backup,  but no recovery.conf has been created

(hint: you can't)

If in scenario 2, you remove the backup_label and proceed with
recovery, there is a significant chance (depending on the timing, and
if you didn't exclude pg_xlog from the backup) that recovery will
_think_ it succeeds but actually leaves you with a corrupt data
directory.

-- 
Andrew (irc:RhodiumToad)


Re: backup_label in a crash recovery

From
Tom Lane
Date:
[ after further thought... ]

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> How do you distinguish between these two scenarios:

> 1) you're starting up in a data dir where you crashed in the middle of
>    a backup

> 2) you're starting up in a data dir that is a restore of a base backup,
>    but no recovery.conf has been created

> (hint: you can't)

Hmm ... you can not tell this if the postmaster just started, and
I agree that removing backup_label in such a case is too risky.
However, in a typical crash scenario the postmaster doesn't die,
it just kills off and restarts its children; and in that scenario
we do have additional knowledge, namely that the postmaster was
already up.  I think it could be safe and useful to forcibly remove
backup_label before commencing recovery, *if* we know that the system
had previously been in fully-operational status.

However, this begs the question: does a backend crash necessarily imply
that an in-progress base backup has to be canceled and restarted from
scratch?  It's not clear to me why you wouldn't consider that the backup
can keep going.  So maybe what we really want here is not to remove the
label file, but to have the postmaster signal to the recovery process
that it knows this is a crash recovery and any backup_label should be
ignored.
        regards, tom lane


Re: backup_label in a crash recovery

From
Fujii Masao
Date:
Hi,

On Wed, Nov 4, 2009 at 12:01 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> 2) you're starting up in a data dir that is a restore of a base backup,
>   but no recovery.conf has been created

Is the scenario 2 (i.e., a normal crash recovery without recovery.conf)
supported in postgres? But, anyway, it's possible by admin's error in
operation. So maybe backup_label should not be removed automatically for
the fail-safe protection, in that case.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center