On Fri, 2013-04-12 at 23:03 +0300, Heikki Linnakangas wrote:
> I think this is a bad idea. It complicates the WAL format significantly.
> Simon's patch didn't include the changes to recovery to validate the
> checksum, but I suspect it would be complicated. And it reduces the
> error-detection capability of WAL recovery. Keep in mind that unlike
> page checksums, which are never expected to fail, so even if we miss a
> few errors it's still better than nothing, the WAL checkum is used to
> detect end-of-WAL. There is expected to be a failure every time we do
> crash recovery. This far, we've considered the probability of one in
> 1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak.
One thing that just occurred to me is that we could make the SIMD
checksum a 32-bit checksum, and reduce it down to 16 bits for the data
pages. That might give us more flexibility to later use it for WAL
without compromising on the error detection nearly as much (though
obviously that wouldn't work with Simon's current proposal which uses
the same data page checksum in a WAL backup block).
In general, we have more flexibility with WAL because there is no
upgrade issue. It would be nice to share code with the data page
checksum algorithm; but really we should just use whatever offers the
best trade-off in terms of complexity, performance, and error detection
rate.
I don't think we need to decide all of this right now. Personally, I'm
satisfied having SIMD checksums on data pages now and leaving WAL
optimization until later.
Regards,Jeff Davis