We have seen a couple instances recently of WAL recovery failing due to
the recently added code that validates a page header as soon as the page
is read in, for example Olivier Prenant's crash report here:
http://archives.postgresql.org/pgsql-hackers/2003-10/msg01505.php
This failure is actually entirely pointless, because (AFAIK) any page
that is brought in during WAL recovery is going to be overwritten in
toto from the WAL log. So it would be safe to run WAL recovery with
zero_damaged_pages enabled. Rather than expecting DBAs to think of that
under the stress of a crashed-database situation, I propose that we do
it for them:
*** src/backend/storage/buffer/bufmgr.c.orig Fri Nov 21 12:41:31 2003
--- src/backend/storage/buffer/bufmgr.c Sat Nov 29 13:35:14 2003
***************
*** 231,237 **** if (status == SM_SUCCESS && !PageHeaderIsValid((PageHeader)
MAKE_PTR(bufHdr->data))) {
! if (zero_damaged_pages) { ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
--- 231,237 ---- if (status == SM_SUCCESS && !PageHeaderIsValid((PageHeader)
MAKE_PTR(bufHdr->data))) {
! if (zero_damaged_pages || InRecovery) { ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
Comments?
regards, tom lane