Home > mailing lists

Re: Enabling Checksums - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Enabling Checksums
Date	November 19, 2012 21:46:35
Msg-id	1353361583.1102.18.camel@sussancws0025 Whole thread Raw
In response to	Re: Enabling Checksums (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

On Mon, 2012-11-19 at 10:35 -0800, Jeff Davis wrote:
> Yes, the blocks written *after* the checkpoint might have a bad checksum
> that will be fixed during recovery. But the blocks written *before* the
> checkpoint should have a valid checksum, but if they don't, then
> recovery doesn't know about them.
> 
> So, we can't verify the checksums in the base backup because it's
> expected that some blocks will fail the check, and they can be fixed
> during recovery. That gives us no protection for blocks that were truly
> corrupted and written long before the last checkpoint.
> 
> I suppose if we could somehow differentiate the blocks, that might work.
> Maybe look at the LSN and only validate blocks written before the
> checkpoint? But of course, that's a problem because a corrupt block
> might have the wrong LSN (in fact, it's likely, because garbage is more
> likely to make the LSN too high than too low).

It might be good enough here to simply retry the checksum verification
if it fails for any block. Postgres shouldn't be issuing write()s for
the same block very frequently, and they shouldn't take very long, so
the chances of failing several times seems vanishingly small unless it's
a real failure.

Through a suitably complex mechanism, I think we can be more sure. The
external program could wait for a checkpoint (or force one manually),
and then recalculate the checksum for that page. If checksum is the same
as the last time, then we know the block is bad (because the checkpoint
would have waited for any writes in progress). If the checksum does
change, then we assume postgres must have modified it since the backup
started, so we can assume that we have a full page image to fix it. (A
checkpoint is a blunt tool here, because all we need to do is wait for
the write() call to finish, but it suffices.)

That complexity is probably not required, and simply retrying a few
times is probably much more practical. But it still bothers me a little
to think that the external tool could falsely indicate a checksum
failure, however remote that chance.

Regards,Jeff Davis

pgsql-hackers by date:

From: Alvaro Herrera
Date: 19 November 2012, 21:36:42
Subject: Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)

From: Alexander Korotkov
Date: 19 November 2012, 21:58:44
Subject: Re: WIP: index support for regexp search

Re: Enabling Checksums - Mailing list pgsql-hackers

Previous

Next