Re: Protecting against unexpected zero-pages: proposal - Mailing list pgsql-hackers

From Gurjeet Singh
Subject Re: Protecting against unexpected zero-pages: proposal
Date
Msg-id AANLkTi=NFr9kP6bhfwfmB5TEmdwwidbXe9UyqR7z30mF@mail.gmail.com
Whole thread Raw
In response to Re: Protecting against unexpected zero-pages: proposal  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Protecting against unexpected zero-pages: proposal
List pgsql-hackers
On Tue, Nov 9, 2010 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
There are also crosschecks that you can apply: if it's a heap page, are
there any index pages with pointers to it?  If it's an index page, are
there downlink or sibling links to it from elsewhere in the index?
A page that Postgres left as zeroes would not have any references to it.

IMO there are a lot of methods that can separate filesystem misfeasance
from Postgres errors, probably with greater reliability than this hack.
I would also suggest that you don't really need to prove conclusively
that any particular instance is one or the other --- a pattern across
multiple instances will tell you what you want to know.

Doing this postmortem on a regular deployment and fixing the problem would not be too difficult. But this platform, which Postgres is a part of,  would be mostly left unattended once deployed (pardon me for not sharing the details, as I am not sure if I can).

An external HA component is supposed to detect any problems (by querying Postgres or by external means) and take an evasive action. It is this automation of problem detection that we are seeking.

As Greg pointed out, even with this hack in place, we might still get zero pages from the FS (say, when ext3 does metadata journaling but not block journaling). In that case we'd rely on recovery's WAL replay of relation extension to reintroduce the magic number in pages.
 
What's more, if I did believe that this was a safe and
reliable technique, I'd be unhappy about the opportunity cost of
reserving it for zero-page testing rather than other purposes.


This is one of those times where you are a bit too terse for me. What does zero-page imply that this hack wouldn't?

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Protecting against unexpected zero-pages: proposal
Next
From: Gurjeet Singh
Date:
Subject: DROP TABLESPACE needs crash-resistance