Re: 9.3: summary of corruption detection / checksums / CRCs discussion - Mailing list pgsql-hackers

From Greg Stark
Subject Re: 9.3: summary of corruption detection / checksums / CRCs discussion
Date
Msg-id CAM-w4HMiJQGfE+S-u8fp0Gjk2vRydtdJ3BFxDS+=mcC=CnUq=g@mail.gmail.com
Whole thread Raw
In response to 9.3: summary of corruption detection / checksums / CRCs discussion  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: 9.3: summary of corruption detection / checksums / CRCs discussion  (Jeff Davis <pgsql@j-davis.com>)
Re: 9.3: summary of corruption detection / checksums / CRCs discussion  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, Apr 21, 2012 at 10:40 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> * In addition to detecting random garbage, we also need to be able to
> detect zeroing of pages. Right now, a zero page is not considered
> corrupt, so that's a problem. We'll need to WAL table extension
> operations, and we'll need to mitigate the performance impact of doing
> so. I think we can do that by extending larger tables by many pages
> (say, 16 at a time) so we can amortize the cost of WAL and avoid
> contention.

I haven't seen this come up in discussion. WAL logging table
extensions wouldn't by itself work because currently we treat the file
size on disk as the size of the table. So you would have to do the
extension in the critical section or else different backends might see
the wrong file size and write out conflicting wal entries.

> -----------------------------------------------
> TORN PAGES
> -----------------------------------------------
>
> We don't want torn pages to falsely indicate a checksum failure. Many
> page writes are already protected from this with full-page images in the
> WAL; but hint bit updates (including the index dead tuple markers) are
> not.
>
> * Just pay the price -- WAL all hint bit updates, including FPIs.
>
> * Double-Write buffer -- this attacks the problem most directly. Don't

The earlier consensus was to move all the hint bits to a dedicated
area and exclude them from the checksum. I think double-write buffers
seem to have become more fashionable but a summary that doesn't
describe the former is definitely incomplete.

Fwiw the tradeoff here is at least partly between small and large
systems. For double writes to be at all efficient you need either
flash or a dedicated third spindle in addition to the logs and data.
For smaller systems that would be a huge cost but for larger ones
that's not really a problem at all.


> * Bulk Load -- this is more indirect. The idea is that, during normal
> OLTP operation, using the WAL for hints might not be so bad, because the
> page is likely to need a FPI for some other reason. The worst case is
> when bulk loading, so see if we can set hint bits during the bulk load
> in an MVCC-safe way.
> http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com

That link points to the MVCC-safe truncate patch. I don't follow how
optimizations in bulk loads are relevant to wal logging hint bit
updates.


-- 
greg


pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: 9.3: summary of corruption detection / checksums / CRCs discussion
Next
From: Jeff Davis
Date:
Subject: Re: 9.3: summary of corruption detection / checksums / CRCs discussion