Thread: Automatically force zero_damaged_pages while InRecovery?

Automatically force zero_damaged_pages while InRecovery?

From
Tom Lane
Date:
We have seen a couple instances recently of WAL recovery failing due to
the recently added code that validates a page header as soon as the page
is read in, for example Olivier Prenant's crash report here:

http://archives.postgresql.org/pgsql-hackers/2003-10/msg01505.php

This failure is actually entirely pointless, because (AFAIK) any page
that is brought in during WAL recovery is going to be overwritten in
toto from the WAL log.  So it would be safe to run WAL recovery with
zero_damaged_pages enabled.  Rather than expecting DBAs to think of that
under the stress of a crashed-database situation, I propose that we do
it for them:

*** src/backend/storage/buffer/bufmgr.c.orig    Fri Nov 21 12:41:31 2003
--- src/backend/storage/buffer/bufmgr.c    Sat Nov 29 13:35:14 2003
***************
*** 231,237 ****         if (status == SM_SUCCESS &&             !PageHeaderIsValid((PageHeader)
MAKE_PTR(bufHdr->data)))        {
 
!             if (zero_damaged_pages)             {                 ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
--- 231,237 ----         if (status == SM_SUCCESS &&             !PageHeaderIsValid((PageHeader)
MAKE_PTR(bufHdr->data)))        {
 
!             if (zero_damaged_pages || InRecovery)             {                 ereport(WARNING,
  (errcode(ERRCODE_DATA_CORRUPTED),
 


Comments?
        regards, tom lane


Re: Automatically force zero_damaged_pages while InRecovery?

From
Joe Conway
Date:
Tom Lane wrote:
> This failure is actually entirely pointless, because (AFAIK) any page
> that is brought in during WAL recovery is going to be overwritten in
> toto from the WAL log.  So it would be safe to run WAL recovery with
> zero_damaged_pages enabled.  Rather than expecting DBAs to think of that
> under the stress of a crashed-database situation, I propose that we do
> it for them:

Sounds like a good idea to me.

Joe



Re: Automatically force zero_damaged_pages while InRecovery?

From
Alvaro Herrera
Date:
On Sat, Nov 29, 2003 at 11:31:03AM -0800, Joe Conway wrote:
> Tom Lane wrote:
> >This failure is actually entirely pointless, because (AFAIK) any page
> >that is brought in during WAL recovery is going to be overwritten in
> >toto from the WAL log.  So it would be safe to run WAL recovery with
> >zero_damaged_pages enabled.  Rather than expecting DBAs to think of that
> >under the stress of a crashed-database situation, I propose that we do
> >it for them:
> 
> Sounds like a good idea to me.

Yes, but is the error message going to stay verbatim?  Maybe the
category could be decreased ...

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Everybody understands Mickey Mouse. Few understand Hermann Hesse.
Hardly anybody understands Einstein. And nobody understands Emperor Norton."


Re: Automatically force zero_damaged_pages while InRecovery?

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Yes, but is the error message going to stay verbatim?  Maybe the
> category could be decreased ...

The patch I proposed would still cause a WARNING to come out in the
postmaster log.  We could perhaps reduce it to a NOTICE if InRecovery
is true, or have InRecovery add a DETAIL pointing out that this should
be no problem (though the latter behavior couldn't be back-ported to
7.3 very readily).  We'd not want to change the message wording in the
7.4 branch either.

I think this is all a bit academic though, compared to the basic point
of not failing ;-).  In practice, any recovery from WAL is going to
produce a bunch of scary-looking messages in the postmaster log.
We're not likely to remove those messages, since they provide
potentially critical debugging info if it fails to work.  I'm not that
concerned about making them less scary-looking ...
        regards, tom lane