Re: Protecting against unexpected zero-pages: proposal - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Protecting against unexpected zero-pages: proposal
Date
Msg-id AANLkTi=p_p2_QPbtHVVcUQzPk7LDwiWr7ixxxW81pTQz@mail.gmail.com
Whole thread Raw
In response to Re: Protecting against unexpected zero-pages: proposal  (Gurjeet Singh <singh.gurjeet@gmail.com>)
Responses Re: Protecting against unexpected zero-pages: proposal
List pgsql-hackers
On Sun, Nov 7, 2010 at 4:23 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
> I understand that it is a pretty low-level change, but IMHO the change is
> minimal and is being applied in well understood places. All the assumptions
> listed have been effective for quite a while, and I don't see these
> assumptions being affected in the near future. Most crucial assumptions we
> have to work with are, that XLogPtr{n, 0xFFFFFFFF} will never be used, and
> that mdextend() is the only place that extends a relation (until we
> implement an md.c sibling, say flash.c or tape.c; the last change to md.c
> regarding mdextend() was in January 2007).

I think the assumption that isn't tested here is what happens if the
server crashes. The logic may work fine as long as nothing goes wrong
but if something does it has to be fool-proof.

I think having zero-filled blocks at the end of the file if it has
been extended but hasn't been fsynced is an expected failure mode of a
number of filesystems. The log replay can't assume seeing such a block
is a problem since that may be precisely the result of the crash that
caused the replay. And if you disable checking for this during WAL
replay then you've lost your main chance to actually detect the
problem.

Another issue -- though I think a manageable one -- is that I expect
we'll want to be be using posix_fallocate() sometime soon. That will
allow efficient guaranteed pre-allocated space with better contiguous
layout than currently. But ext4 can only pretend to give zero-filled
blocks, not any random bitpattern we request. I can see this being an
optional feature that is just not compatible with using
posix_fallocate() though.

It does seem like this is kind of part and parcel of adding checksums
to blocks. It's arguably kind of silly to add checksums to blocks but
have an commonly produced bitpattern in corruption cases go
undetected.

-- 
greg


pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: ALTER TABLE ... IF EXISTS feature?
Next
From: Gurjeet Singh
Date:
Subject: Re: Protecting against unexpected zero-pages: proposal