Thread: Zeroing damaged pages

Zeroing damaged pages

From
Simon Riggs
Date:
There appears to be some issues or at least a lack of clarity with the
way we zero damaged/missing pages in various circumstances.

1. If we have an invalid page header then we execute the code below
(from bufmgr.c ReadBuffer()).. In the case that we are only reading the
page, not writing to it, zero_damaged_pages causes the page to be
overwritten with zeroes in memory, yet does not set the buffer BM_DIRTY.
The page in memory is zeroed, but the on-disk copy is never fully zeroed
because the page is never flushed back onto the database (unless a
subsequent write does dirty the page). This behaviour is somewhat
desirable from a support perspective, but isn't fully documented, so
ad-hoc attempts to solve problems appear to have failed. VACUUM *does*
write pages that it has "fixed" to disk, so lazy_scan_heap() operates
differently to a SELECT.

A patch prototype to make zero_damaged_pages work as advertised is
enclosed, though the current behaviour may well be preferred, in which
case a doc patch is more appropriate.

However, since autovacuum the window of opportunity for support to
assist with data recovery is smaller and somewhat random. VACUUM can
come along at any time and destroy the remaining data on the corrupt
page, destroying any benefit from the current behaviour. Also, since
some damaged pages will be written to and some not, we have a fairly
random amount of data loss, yet the logs don't differentiate between a
page that has been zeroed-and-written-to-disk and a page that has been
zeroed-and-ignored.

Perhaps we might want a new parameter called ignore_damaged_pages, so
that VACUUM would also ignore the bad data? At the very least we should
have a doc patch explaining the current behaviour better so that people
understand what to do when messages start appearing in the logs?

in bufmgr.c ReadBuffer() we have this code starting line 240

    if (!PageHeaderIsValid((PageHeader) bufBlock))
    {
    /*
     * During WAL recovery, the first access to any data page should
     * overwrite the whole page from the WAL; so a clobbered page
     * header is not reason to fail.  Hence, when InRecovery we may
     * always act as though zero_damaged_pages is ON.
     */
    if (zero_damaged_pages || InRecovery)
    {
        ereport(WARNING,
        (errcode(ERRCODE_DATA_CORRUPTED),
         errmsg("invalid page header in block %u of relation \"%s\"; zeroing
out page",
        blockNum, RelationGetRelationName(reln))));
          MemSet((char *) bufBlock, 0, BLCKSZ);
    }
    else
        ...
    }

    if (isLocalBuf)
    {
        /* Only need to adjust flags */
        bufHdr->flags |= BM_VALID | setdirty;
    }
    else
    {
    /* Set BM_VALID, terminate IO, and wake up any waiters */
        TerminateBufferIO(bufHdr, false, BM_VALID | setdirty);
    }

So the above code writes to a page, yet doesn't set BM_DIRTY. (It
doesn't unset it either, but thats not exactly the same thing).

2. Also from the code, the assumption that InRecovery means we can
ignore corrupt pages is only true when rolling forward on a set of logs
that contains full_page_writes blocks. That seems like an invalid
assumption when the page being accessed isn't going to be fully
overwritten, as in the case when we set full_page_writes = off. If we
agree on that, it seems like a deeper fix is required, rather than the
seemingly obvious (InRecovery && full_page_writes), since the current
setting of that parameter doesn't necessarily reflect how it was set
historically over the full range of log files used for any recovery.

3. Checking on other possible InRecovery issues, I note in slru.c that
we ignore missing segments. As the comment explains, this is valid in
some circumstances, but not in all. If a file *should* be there because
the clog (etc) hasn't yet been truncated to that point we have a hole in
the sequence of segments. We do not detect that situation, just zero it
and march on even when zero_damaged_pages is *not* set. A message is
written to the log, though there is no way to tell from that the
difference between a valid and invalid occurrence of that message.

 /*
  * In a crash-and-restart situation, it's possible for us to receive
  * commands to set the commit status of transactions whose bits are in
  * already-truncated segments of the commit log (see notes in
  * SlruPhysicalWritePage). Hence, if we are InRecovery, allow the case
  * where the file doesn't exist, and return zeroes instead.
  */
 fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
 if (fd < 0)
 {
    if (errno != ENOENT || !InRecovery)

4. In md.c we do not throw any kind of message when InRecovery we are
asked for a block beyond the current range. Again, in most recovery
circumstances this is not an error, but in this case there is no message
to show it ever happened so difficult to confirm it as error/non-error.
(The block address is no-doubt correct because it comes from a
CRC-checked WAL record, but we do not check the case that the file
should have existed and yet does not. There's no easy way of doing this
except for keeping track of such requests and matching them with later
truncations. That's non-trivial, so it might be best to just log the
fact that it happened so we can decide if its a problem manually - which
should be sufficient for most cases.

3 and 4 are slightly different cases of zeroing data, since we aren't
overwriting anything, just working around the case of missing data. But
I see an issue that not reporting potentially missing data is still
silent data loss.

Best Regards, Simon Riggs


Attachment

Re: Zeroing damaged pages

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> A patch prototype to make zero_damaged_pages work as advertised is
> enclosed, though the current behaviour may well be preferred, in which
> case a doc patch is more appropriate. 

I don't think this is a good idea, and even if it were, the proposed
patch is a model of obscurantism.

> However, since autovacuum the window of opportunity for support to
> assist with data recovery is smaller and somewhat random.

Hmm .... it'd probably be a good idea to force zero_damaged_pages OFF in
the autovac subprocess.  That parameter is only intended for interactive
use --- as you say, autovac would be a rather nasty loose cannon if it
fired up with this parameter ON.

Are there any other things that ought to be disabled in autovac?
        regards, tom lane


Re: Zeroing damaged pages

From
Simon Riggs
Date:
On Thu, 2006-02-23 at 11:54 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > A patch prototype to make zero_damaged_pages work as advertised is
> > enclosed, though the current behaviour may well be preferred, in which
> > case a doc patch is more appropriate. 
> 
> I don't think this is a good idea, and even if it were, the proposed
> patch is a model of obscurantism.

;-)

Just some reflections on a recent db recovery for a client.

> > However, since autovacuum the window of opportunity for support to
> > assist with data recovery is smaller and somewhat random.
> 
> Hmm .... it'd probably be a good idea to force zero_damaged_pages OFF in
> the autovac subprocess.  That parameter is only intended for interactive
> use --- as you say, autovac would be a rather nasty loose cannon if it
> fired up with this parameter ON.

We can:
- disable zero_damaged_pages in autovac
- update the docs to say don't set this in postgresql.conf

> Are there any other things that ought to be disabled in autovac?

Good question. Not AFAICS.

Best Regards, Simon Riggs



Re: Zeroing damaged pages

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Thu, 2006-02-23 at 11:54 -0500, Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > A patch prototype to make zero_damaged_pages work as advertised is
> > > enclosed, though the current behaviour may well be preferred, in which
> > > case a doc patch is more appropriate. 
> > 
> > I don't think this is a good idea, and even if it were, the proposed
> > patch is a model of obscurantism.
> 
> ;-)
> 
> Just some reflections on a recent db recovery for a client.
> 
> > > However, since autovacuum the window of opportunity for support to
> > > assist with data recovery is smaller and somewhat random.
> > 
> > Hmm .... it'd probably be a good idea to force zero_damaged_pages OFF in
> > the autovac subprocess.  That parameter is only intended for interactive
> > use --- as you say, autovac would be a rather nasty loose cannon if it
> > fired up with this parameter ON.
> 
> We can:
> - disable zero_damaged_pages in autovac

I am wondering if we should prevent autovac from running if
zero_damaged_pages is set in postgresql.conf.  I can imagine autovac
aborting when trying to read a damaged page if zero_damaged_pages is
ignored.

We can't prevent zero_damaged_pages from being set in postgresql.conf
because it is possible for the system to be so corrupted that you can't
even start a backend without it being set.

--  Bruce Momjian   http://candle.pha.pa.us SRA OSS, Inc.   http://www.sraoss.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Zeroing damaged pages

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> On Thu, 2006-02-23 at 11:54 -0500, Tom Lane wrote:
>>> Hmm .... it'd probably be a good idea to force zero_damaged_pages OFF in
>>> the autovac subprocess.  That parameter is only intended for interactive
>>> use --- as you say, autovac would be a rather nasty loose cannon if it
>>> fired up with this parameter ON.

> I am wondering if we should prevent autovac from running if
> zero_damaged_pages is set in postgresql.conf.

What's wrong with just turning it off locally in the autovac process?

If the admin prefers autovac not run at all while he's fooling around,
he can disable it in postgresql.conf (or perhaps even better, run in
single-user mode).  But I don't think it's appropriate to force that
decision on him.
        regards, tom lane