Re: New CRC algorithm: Slicing by 8 - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: New CRC algorithm: Slicing by 8
Date
Msg-id 87lkn3yyqe.fsf@enterprisedb.com
Whole thread Raw
In response to Re: New CRC algorithm: Slicing by 8  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-hackers
"Simon Riggs" <simon@2ndquadrant.com> writes:

> I've looked into this in more depth following your suggestion: I think
> it seems straightforward to move the xl_prev field from being a header
> to a trailer. That way when we do the test on the back pointer we will
> be assured that there is no torn page effecting the remainder of the
> xlrec. That would make it safer with wal_checksum = off.

Hm. I think in practice this may actually help reduce the exposure to torn
pages. However in theory there's no particular reason to think the blocks will
be written out in physical order.

The kernel may sync its buffers in some order dictated by its in-memory data
structure and may end up coming across the second half of the 8kb page before
the first half. It may even lie earlier on disk than the first half if the
filesystem started a new extent at that point.

If they were 4kb pages there would be fewer ways it could be written out of
order, but even then the hard drive could find a bad block and remap it. I'm
not sure what level of granularity drives remap at, it may be less than 4kb.

To eliminate the need for the CRC in the WAL for everyone and still be safe
from torn pages I think you have to have something like xl_prev repeated every
512b throughout the page.

But if this is only an option for systems that don't expect to suffer from
torn pages then sure, putting it in a footer seems like a good way to reduce
the exposure somewhat. Putting it in both a header *and* a footer might be
even better.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: COPY does not work with regproc and aclitem
Next
From: Zdenek Kotala
Date:
Subject: Re: COPY does not work with regproc and aclitem