Thread: Confusing message on startup after a crash while recovering
Hi When postgres crashes during recovery, and is then restarted, it says: "database system was interrupted while in recovery at ... This probably means that some data is corrupted and you will have to use the last backup for recovery." When I first read that message, I assumed that there are cases were postgres can't recover from a crash that happened during recovery. I guessed that some operations done during wal restore are not idempotent, and lead to corrupt data if performed twice. Only after actually reading the sourcecode of xlog.c, and seeing that the a similar (but better worded) warning is output after a crash during archive log replay, I realized that this warning probably just means that corrupt data could be the _cause_ for the crash during recovery, not the _caused_by_ a crash during recovery. I'd suggest that the text is changed to something along the line of: "database system was interrupted while in recovery at ... If this has occurred more than once some data may be corrupted and you may need to restore from the last backup." This would also match the message for "interrupted while doign archive log replay" more closely. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > I'd suggest that the text is changed to something along the line of: > "database system was interrupted while in recovery at ... > If this has occurred more than once some data may be corrupted and > you may need to restore from the last backup." It seems the real problem is that it's not specifying *which* data is probably corrupted. Maybe: HINT: If recovery fails repeatedly, it probably means that the recovery log data is corrupted; you may have to restore from your last full backup. Also, do we want to suggest use of pg_resetxlog in the message? regards, tom lane
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> I'd suggest that the text is changed to something along the line of: >> "database system was interrupted while in recovery at ... >> If this has occurred more than once some data may be corrupted and >> you may need to restore from the last backup." > > It seems the real problem is that it's not specifying *which* data is > probably corrupted. Maybe: > > HINT: If recovery fails repeatedly, it probably means that the recovery log > data is corrupted; you may have to restore from your last full backup. IMHO that wording would be fine too - the important points for me is to clearly state that corrupted data is maybe the _cause_ of the crash, and not the _effect_ of the crash. And for the sake of consistency, the message for abort-during-recovery and abort-during-archivelog-replay should be similar. > Also, do we want to suggest use of pg_resetxlog in the message? I'd rather add some documentation of how to use pg_resetxlog to the manual if it's not already there, any maybe reference that chapter in a HINT message. In that manual chapter you can warn about the dangers of pg_resetxlog, and put in an advice to backup the database before using it. I think such a warning is important, because any documentation of pg_resetxlog is targeted at users know are not familiar with postgres internals, and those users are likely to shoot themselves in their foot if you point them to pg_resetxlog. greetings, Florian Pflug