Thread: Automatically force zero_damaged_pages while InRecovery?
We have seen a couple instances recently of WAL recovery failing due to the recently added code that validates a page header as soon as the page is read in, for example Olivier Prenant's crash report here: http://archives.postgresql.org/pgsql-hackers/2003-10/msg01505.php This failure is actually entirely pointless, because (AFAIK) any page that is brought in during WAL recovery is going to be overwritten in toto from the WAL log. So it would be safe to run WAL recovery with zero_damaged_pages enabled. Rather than expecting DBAs to think of that under the stress of a crashed-database situation, I propose that we do it for them: *** src/backend/storage/buffer/bufmgr.c.orig Fri Nov 21 12:41:31 2003 --- src/backend/storage/buffer/bufmgr.c Sat Nov 29 13:35:14 2003 *************** *** 231,237 **** if (status == SM_SUCCESS && !PageHeaderIsValid((PageHeader) MAKE_PTR(bufHdr->data))) { ! if (zero_damaged_pages) { ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED), --- 231,237 ---- if (status == SM_SUCCESS && !PageHeaderIsValid((PageHeader) MAKE_PTR(bufHdr->data))) { ! if (zero_damaged_pages || InRecovery) { ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED), Comments? regards, tom lane
Tom Lane wrote: > This failure is actually entirely pointless, because (AFAIK) any page > that is brought in during WAL recovery is going to be overwritten in > toto from the WAL log. So it would be safe to run WAL recovery with > zero_damaged_pages enabled. Rather than expecting DBAs to think of that > under the stress of a crashed-database situation, I propose that we do > it for them: Sounds like a good idea to me. Joe
On Sat, Nov 29, 2003 at 11:31:03AM -0800, Joe Conway wrote: > Tom Lane wrote: > >This failure is actually entirely pointless, because (AFAIK) any page > >that is brought in during WAL recovery is going to be overwritten in > >toto from the WAL log. So it would be safe to run WAL recovery with > >zero_damaged_pages enabled. Rather than expecting DBAs to think of that > >under the stress of a crashed-database situation, I propose that we do > >it for them: > > Sounds like a good idea to me. Yes, but is the error message going to stay verbatim? Maybe the category could be decreased ... -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Everybody understands Mickey Mouse. Few understand Hermann Hesse. Hardly anybody understands Einstein. And nobody understands Emperor Norton."
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Yes, but is the error message going to stay verbatim? Maybe the > category could be decreased ... The patch I proposed would still cause a WARNING to come out in the postmaster log. We could perhaps reduce it to a NOTICE if InRecovery is true, or have InRecovery add a DETAIL pointing out that this should be no problem (though the latter behavior couldn't be back-ported to 7.3 very readily). We'd not want to change the message wording in the 7.4 branch either. I think this is all a bit academic though, compared to the basic point of not failing ;-). In practice, any recovery from WAL is going to produce a bunch of scary-looking messages in the postmaster log. We're not likely to remove those messages, since they provide potentially critical debugging info if it fails to work. I'm not that concerned about making them less scary-looking ... regards, tom lane