Thread: Confusing message on startup after a crash while recovering

Confusing message on startup after a crash while recovering

From
"Florian G. Pflug"
Date:
Hi

When postgres crashes during recovery, and is then restarted, it
says:
"database system was interrupted while in recovery at ...
This probably means that some data is corrupted and
you will have to use the last backup for recovery."

When I first read that message, I assumed that there are cases were
postgres can't recover from a crash that happened during recovery.
I guessed that some operations done during wal restore are not
idempotent, and lead to corrupt data if performed twice.

Only after actually reading the sourcecode of xlog.c, and seeing that
the a similar (but better worded) warning is output after a crash during
archive log replay, I realized that this warning probably just means
that corrupt data could be the _cause_ for the crash during recovery, not
the _caused_by_ a crash during recovery.

I'd suggest that the text is changed to something along the line of:
"database system was interrupted while in recovery at ...
If this has occurred more than once some data may be corrupted and
you may need to restore from the last backup."

This would also match the message for "interrupted while doign archive
log replay" more closely.

greetings, Florian Pflug


Re: Confusing message on startup after a crash while recovering

From
Tom Lane
Date:
"Florian G. Pflug" <fgp@phlo.org> writes:
> I'd suggest that the text is changed to something along the line of:
> "database system was interrupted while in recovery at ...
> If this has occurred more than once some data may be corrupted and
> you may need to restore from the last backup."

It seems the real problem is that it's not specifying *which* data is
probably corrupted.  Maybe:

HINT: If recovery fails repeatedly, it probably means that the recovery log
data is corrupted; you may have to restore from your last full backup.

Also, do we want to suggest use of pg_resetxlog in the message?
        regards, tom lane


Re: Confusing message on startup after a crash while recovering

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> I'd suggest that the text is changed to something along the line of:
>> "database system was interrupted while in recovery at ...
>> If this has occurred more than once some data may be corrupted and
>> you may need to restore from the last backup."
> 
> It seems the real problem is that it's not specifying *which* data is
> probably corrupted.  Maybe:
> 
> HINT: If recovery fails repeatedly, it probably means that the recovery log
> data is corrupted; you may have to restore from your last full backup.

IMHO that wording would be fine too - the important points for me is to 
clearly state that corrupted data is maybe the _cause_ of the crash, and
not the _effect_ of the crash. And for the sake of consistency, the 
message for abort-during-recovery and abort-during-archivelog-replay 
should be similar.

> Also, do we want to suggest use of pg_resetxlog in the message?
I'd rather add some documentation of how to use pg_resetxlog to the
manual if it's not already there, any maybe reference that chapter in
a HINT message. In that manual chapter you can warn about the dangers
of pg_resetxlog, and put in an advice to backup the database before 
using it. I think such a warning is important, because any documentation 
of pg_resetxlog is targeted at users know are not familiar with postgres
internals, and those users are likely to shoot themselves in their foot
if you point them to pg_resetxlog.

greetings, Florian Pflug