On Mon, 2012-11-19 at 10:35 -0800, Jeff Davis wrote:
> Yes, the blocks written *after* the checkpoint might have a bad checksum
> that will be fixed during recovery. But the blocks written *before* the
> checkpoint should have a valid checksum, but if they don't, then
> recovery doesn't know about them.
>
> So, we can't verify the checksums in the base backup because it's
> expected that some blocks will fail the check, and they can be fixed
> during recovery. That gives us no protection for blocks that were truly
> corrupted and written long before the last checkpoint.
>
> I suppose if we could somehow differentiate the blocks, that might work.
> Maybe look at the LSN and only validate blocks written before the
> checkpoint? But of course, that's a problem because a corrupt block
> might have the wrong LSN (in fact, it's likely, because garbage is more
> likely to make the LSN too high than too low).
It might be good enough here to simply retry the checksum verification
if it fails for any block. Postgres shouldn't be issuing write()s for
the same block very frequently, and they shouldn't take very long, so
the chances of failing several times seems vanishingly small unless it's
a real failure.
Through a suitably complex mechanism, I think we can be more sure. The
external program could wait for a checkpoint (or force one manually),
and then recalculate the checksum for that page. If checksum is the same
as the last time, then we know the block is bad (because the checkpoint
would have waited for any writes in progress). If the checksum does
change, then we assume postgres must have modified it since the backup
started, so we can assume that we have a full page image to fix it. (A
checkpoint is a blunt tool here, because all we need to do is wait for
the write() call to finish, but it suffices.)
That complexity is probably not required, and simply retrying a few
times is probably much more practical. But it still bothers me a little
to think that the external tool could falsely indicate a checksum
failure, however remote that chance.
Regards,Jeff Davis